IMO any system where taking a dependency is "easy" and there is no penalty for size or cost is going to eventually lead to a dependency problem. That's essentially where we are today both in language repositories for OSS languages and private monorepos.
This is partly due to how we've distributed software over the last 40 years. In the 80s the idea of a library of functionality was something you paid for, and painstakingly included parts of into your size constrained environment (fit it on a floppy). You probably picked apart that library and pulled the bits you needed, integrating them into your builds to be as small as possible.
Today we pile libraries on top of libraries on top of libraries. Its super easy to say `import foolib`, then call `foolib.do_thing()` and just start running. Who knows or cares what all 'foolib' contains.
At each level a caller might need 5% of the functionality of any given dependency. The deeper the dependency tree gets the more waste piles on. Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call, but all you did was take that one dependency to format a number.
In some cases the languages make this worse. Go and Rust, for example, encourage everything for a single package/mod to go in the same file. Adding optional functionality can get ugly when it would require creating new modules, but if you only want to use a tiny part of the module, what do you do?
The only real solution I can think of to deal with this long term is ultra-fine-grained symbols and dependencies. Every function, type, and other top-level language construct needs to declare the set of things it needs to run (other functions, symbols, types, etc). When you depend on that one symbol it can construct, on demand, the exact graph of symbols it needs and dump the rest for any given library. You end up with the minimal set of code for the functionality you need.
Its a terrible idea and I'd hate it, but how else do you address the current setup of effectively building the whole universe of code branching from your dependencies and then dragging it around like a boat anchor of dead code.
> IMO any system where taking a dependency is "easy" and there is no penalty for size or cost is going to eventually lead to a dependency problem.
Go and C# (.NET) are counterexamples. They both have great ecosystems and just as simple and effective package management as Rust or JS (Node). But neither Go or C# have issues with dependency hell like Rust or even more JavaScript, because they have exceptional std libs and even large frameworks like ASP.NET or EF Core.
A great std lib is obviously the solution. Some Rust defenders are talking it down by giving Python as counter example. But again, Go and C# are proving them wrong. A great std lib is a solution, but one that comes with huge efforts that can only be made by large organisations like Google (Go) or Microsoft (C#).
> A large stdlib solves the problems the language is focused on
That's part of it, but it also solves the problem of vetting. When I use a Go stdlib I don't have to personally spend time to vet it like it do when looking at a crate or npm package.
In general, Go & Rust packages on github are high quality to begin with, but there is still a pronounced difference between OS packages and what is approved to be part of the language's own stdlib.
It's nice to know thousands of different companies already found the issues for me or objected to them in reviews before the library was published.
But I agree that graphics is often overlooked in std libs. However that’s a bit of a different beast. Std libs typically deal with what the OS provides. Graphics is its own world so to speak.
As for Wasm: first, that’s a runtime issue and not a language issue. I think GC is on the roadmap for Wasm. Second, Go and C# obviously predate Wasm.
In the end, not every language should be concerned with every use case. The bigger question is whether it provides a std lib for the category of programs it targets.
To take a specific example: JS isn’t great at efficiently and conveniently generating dynamic HTML. You can go far without (or minimal) dependencies and some clever patterns. But a lot of pain and work hours would have been saved if it had something that people want to use out of the box.
You don't consider games, desktop and mobile applications big use cases, each being multi billion industries?
I don't know man, I feel like you're arguing in bad faith and are intentionally ignoring what the athrowaway3z said: it works there because they're essentially languages specifically made to enable web development . That's why their standard lib is plenty for this domain.
I can understand that web development might be the only thing you care about though, it's definitely a large industry - but the thesis of a large standard lib solving the dependency issue really isnt true, as (almost) every other usecase beyond web development shows.
Specifically those languages are back end focused so about 28% of developers. 55 focus on front end. If you add up games desktop and mobile, oddly you get 28% as well. So not bigger but the same size good intuition! That leaves out embedded 8% and systems (8-12%). Which are probably more what rust is used for. There is obviously overlap and we haven't mentioned database or scientific programming at 12 and 5 percent respectively.
Edit: after rereading this I feel like I may have come across sarcastic, I was legitimately impressed a guess without looking it up would peg the ratio that closely. It was off topic as a response too. So I'll add that rust never would have an asynch as good as tokio, or been able to have asynch in embedded as with embassy, if it hadn't opted for batteries excluded. I think this was the right call given its initial focus as a desktop/systems language. And it is what allowed it to be more than that as people added things. Use cargo-deny, pin the oldest version that does what you need and doesn't fail cargo deny. There are several hundred crates brought in by just the rust lang repo, if you only vet things not in that list, you can save some time too.
"Web server" is, more or less, about converting a database into JSON and/or HTML. There are complexities there, sure, but it's not like it's some uniquely monumental undertaking compared to other fields.
Not all web servers deal in HTML or JSON, many don't have databases outside of managing their internal state.
Even ignoring that, those are just common formats. They don't tell you what a particular web server is doing.
Take a few examples of some Go projects that either are web servers or have them as major components like Caddy or Tailscale. Wildly different types of projects.
I guess one has to expand "web server" to include general networking as well, which is definitely a well supported use case or rather category for the Go std lib, which was my original point.
Just to explain this confusion, the term “web server” typically refers specifically to software that is listening for HTTP requests, such as apache or nginx. I would use the term “application server” to refer to the process that is processing requests that the web server sends to it. I read “web server” in their comment as “application server” and it makes sense.
Yes. That's the same distinction I would expect. Although I'm not sure that the database stuff is the role I'd usually look for in the application server itself.
The libraries you listed are too specialized. And they require integration with asset pipeline which is well outside of scope of a programming language.
As for the generic things, I think C# is the only mainstream language which has small vectors, 3x2 and 4x4 matrices, and quaternions in the standard library.
To be fair, there is no language that has a framework that contains all of these things... unless you're using one of the game engines like Unity/Unreal.
If you're willing to constrain yourself to 2D games, and exclude physics engines (assume you just use one of the Box2D bindings) and also UI (2D gamedevs tend to make their own UI systems anyway)... Then your best bet in the C# world is Monogame (https://monogame.net/), which has lots of successful titles shipped on desktop and console (Stardew Valley, Celeste)
> To be fair, there is no language that has a framework that contains all of these things.
Depends. There is Godot Script. Seeing how it comes with a game engine.
But original claim was
> actually dotnet also does not need too many dependencies for games and desktop apps.
If you're including languages with big game engines. It's a tautology. Languages with good game engines, have good game engines.
But general purpose programming language has very little to gain from including a niche library even if it's the best in business. Imagine if C++ shipped with Unreal.
>A great std lib is obviously the solution. Some Rust defenders are talking it down by giving Python as counter example.
Python's standard library is big. I wouldn't call it great, because Python is over 30 years old and it's hard to add things to a standard library and even harder to remove them.
I'm thankful argparse exists in pythons stdlib. But argument parsing is not that hard especially for simpler programs. programmers should be able to think for a minute and figure it out instead of always reaching for clap, thats how you get dependency hell.
Argument parsing, in partucular, is a great place to start realizing that you can implement what you need without adding a dozen dependencies
Hard disagree.
Standardized flag parsing is a blessing on us all, do not want to jave to figure out what flag convention the author picked to implement of the many lile one does with non getopt c programs.
Don't disagree with the principle, there are a lot of trivial pythong deps, but rolling your own argument parsing is not the way
Again, argument parsing is not that hard most of the time. You dont have to make your own conventions. Thats just weird.
If youve never thought about it, it might seem like you need an off-the-shelf dependency. But as programmers sometimes we should think a bit more before we make that decision.
Argument parsing is absolutely the kind of thing where I'd reach for a third-party library if the standard library didn't provide (and in Python's case, maybe even then - argparse has some really unpleasant behaviours). When you look through library code, it might seem like way more than you'd write yourself, and it probably is. But on a conceptual level you'll probably actually end up using a big chunk of it, or at least see a future use for it. And it doesn't tend to pull in a lot of dependencies. (For example, click only needs colorama, and then only on Windows; and that doesn't appear to bring in anything transitively.)
It's a very different story with heavyweight dependencies like Numpy (which include reams of tests, documentation and headers even in the wheels that people are only installing to be a dependency of something else, and covers a truly massive range of functionality including exposing BLAS and LAPACK for people who might just want to multiply some small matrices or efficiently represent an image bitmap), or the more complex ones that end up bringing in multiple things completely unrelated to your project that will never be touched at runtime. (Rich supports a ton of wide-ranging things people might want to do with text in a terminal, and I would guess most clients probably want to do exactly one of those things.)
You can, but there’s always a tradeoff, as soon as I’ve added about the 3rd argument, I always wish i had grabbed a library, because i’m not getting payed to reinvent this wheel.
While not everything in Python's stdlib is great (I am looking at you urllib), I would say most of it is good enough. Python is still my favorite language to get stuff done exactly because of that.
My personal language design is strongly inspired by what I imagine a Python 4 would look like (but also takes hints from other languages, and some entirely new ideas that wouldn't fit neatly in Python).
I don’t want a large std lib. It stifles competition and slows the pace of development. Let libraries rise and fall on their own merits. The std lib should limit itself to the basics.
> but neither Go or C# have issues with dependency hell like Rust or even more JavaScript, because they have exceptional std libs
They also have a lot narrower scope of use, which means it is easier to create stdlib usable for most people. You can't do it with more generic language.
I would say C# gets used almost everything at Microsoft between GUIs, backends, DirectX tooling (new PIX UI, Managed DirectX and XNA back in Creative Arcade days), Azure,..., alongside C++, and even if Microsoft <3 Rust, in much bigger numbers.
Indeed, it has no bearing on binary size at all, because none of it will be included. If you are coming from the perspective where the standard library is entirely unusable to begin with, then improving the standard library is irrelevant at best. It also likely means that at least some time and effort will be taken away from improving the things that you can use to be spent on improving a bunch of things that you can't use.
I feel like this is an organizational problem much more than a technical one, though. Rust can be different things to different people, without necessarily forcing one group to compromise overmuch. But some tension is probably inevitable.
> Indeed, it has no bearing on binary size at all, because none of it will be included.
That depends on the language. In an interpreted language (including JIT), or a language that depends on a dynamically linked runtime (ex c and c++), it isn't directly included in your app because it is part of the runtime. But you need the runtime installed, and if your app is the only thing that uses that runtime, then the runtime size is effectively adds to your installation size.
In languages that statically link the standard library, like go and rust, it absolutely does impact binary size, although the compiler might use some methods to try to avoid including parts of the standard library that aren't used.
Embedded Rust usually means no_std Rust, in which case no, neither the standard library nor any runtime to support it get included in the resulting binary. This isn't getting externalized either; no_std code simply cannot use any of the features that std provides. It is roughly equivalent to freestanding C.
What you say is true enough for external-runtime languages and Go, though TinyGo is available for resource-constrained environments.
Well, Rust's standard library has three components, named core, alloc and std
The no_std Rust only has core but this is indeed a library of code, and freestanding C does not provide such a thing = freestanding C stdlib provides no functions, just type definitions and other stuff which evaporates when compiled.
Two concrete examples to be going along with: Suppose we have a mutable foo, it's maybe foo: [i32; 40]; (forty 32-bit signed integers) or in C maybe they're int foo[40];.
In freestanding C that's fine, but we're not provided with any library code to do anything with foo, we can use the core language features to write it outselves, but nothing is provided.
Rust will happily foo.sort_unstable(); this is a fast custom in-place sort, roughly a modern form of introspective sort written for Rust by its creators and because it's in core, that code just goes into your resulting embedded firmware or whatever.
Now, suppose we want to perform a filter-map operation over that array. In C once again you're left to figure out how to write that in C, in Rust foo impl IntoIterator so you can use all the nice iterator features, the algorithms just get baked into your firmware during compilation.
I think this is partially true, but more nuanced than just saying that Rust std lib is lacking.
Compared to go and c#, Rust std lib is mostly lacking:
- a powerful http lib
- serialization
But Rust approach, no Runtime, no GC, no Reflection, is making it very hard to provide those libraries.
Within these constraints, some high quality solutions emerged, Tokio, Serde. But they pioneered some novel approaches which would have been hard to try in the std lib.
The whole async ecosystem still has a beta vibe, giving the feeling of programming in a different language.
Procedural macros are often synonymous with slow compile times and code bloat.
But what we gained, is less runtime errors, more efficiency, a more robust language.
TLDR: trade-offs everywhere, it is unfair to compare to Go/C# as they are languages with a different set of constraints.
Having some of those libraries listed and then not being able to change API or the implementation is what killed modern C++ adoption (along with the language being a patchwork on top of C).
As some of the previous commenters said, when you focus your language to make it easy to write a specific type of program, then you make tradeoffs that can trap you in those constraints like having a runtime, a garbage collector and a set of APIs that are ingrained in the stdlib.
Rust isn't like that. As a system programmer I want none of them. Rust is a systems programming language. I wouldn't use Rust if it had a bloated stdlib. I am very happy about its stdlib. Being able to swap out the regex, datetime, arg parsing and encoding are a feature. I can choose memory-heavy or cpu-heavy implementations. I can optimize for code size or performance or sometimes neither/both.
If the trade-offs were made to appease the easy (web/app) development, it wouldn't be a systems programming language for me where I can use the same async concepts on a Linux system and an embedded MCU. Rust's design enables that, no other language's design (even C++) does.
If a web developer wants to use a systems programming language, that's their trade-off for a harder to program language. The similar type safety to Rust's is provided with Kotlin or Swift.
Dependency bloat is indeed a problem. Easy inclusion of dependencies is also a contributing factor. This problem can be solved by making dependencies and features granular. If the libraries don't provide the granularity you want, you need to change libraries/audit source/contribute. No free meals.
Yeah I’ve encountered the benefit of this approach recently when writing WASM binaries for the web, where binary size becomes something we want to optimize for.
The de facto standard regex library (which is excellent!) brings in nearly 2 MB of additional content for correct unicode operations and other purposes. The same author also makes regex-lite, though, which did everything we need, with the same interface, in a much smaller package. It made it trivial to toss the functionality we needed behind a trait and choose a regex library appropriately in different portions of our stack.
Indeed. However, you need to recognize that having those features in stdlib creates a huge bias against swapping them out. How many people in Java actually uses alternative DB APIs than JDBC? How many alternative encoding libraries are out there for JSON in Go? How about async runtimes, can you replace that in Go easily?
> Procedural macros are often synonymous with slow compile times and code bloat.
In theory they should reduce it because you wouldn’t make proc macros to generate code you don’t need…right? How much coding time you save with macros compared to manually implementing them?
To be fair I think Rust has very healthy selection of options for both, with Serde and Reqwest/Hyper being de-facto standard.
Rust has other challenges it needs to overcome but this isn't one.
I'd put Go behind both C#/F# and Rust in this area. It has spartan tooling in odd areas it's expected to be strong at like gRPC and the serialization story in Go is quite a bit more painful and bare bones compared to what you get out of System.Text.Json and Serde.
The difference is especially stark with Regex where Go ships with a slow engine (because it does not allow writing sufficiently fast code in this area at this moment) where-as both Rust and C# have top of the line implementations in each which beat every other engine save for Intel Hyperscan[0].
> (because it does not allow writing sufficiently fast code in this area at this moment)
I don't think that's why. Or at least, I don't think it's straight-forward to draw that conclusion yet. I don't see any reason why the lazy DFA in RE2 or the Rust regex crate couldn't be ported to Go[1] and dramatically speed things up. Indeed, it has been done[2], but it was never pushed over the finish line. My guess is it would make Go's regexp engine a fair bit more competitive in some cases. And aside from that, there's tons of literal optimizations that could still be done that don't really have much to do with Go the language.
Could a Go-written regexp engine be faster or nearly as fast because of the language? Probably not. But I think the "implementation quality" is a far bigger determinant in explaining the current gap.
> At each level a caller might need 5% of the functionality of any given dependency. The deeper the dependency tree gets the more waste piles on. Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call, but all you did was take that one dependency to format a number.
I'm not convinced that happens that often.
As someone working on a Rust library with a fairly heavy dependency tree (Xilem), I've tried a few times to see if we could trim it by tweaking feature flags, and most of the times it turned out that they were downstream of things we needed: Vulkan support, PNG decoding, unicode shaping, etc.
When I did manage to find a superfluous dependency, it was often something small and inconsequential like once_cell. The one exception was serde_json, which we could remove after a small refactor (though we expect most of our users to depend on serde anyway).
We're looking to remove or at least decouple larger dependencies like winit and wgpu, but that requires some major architectural changes, it's not just "remove this runtime option and win 500MB".
Not in Rust, but I've seen it with Python in scientific computing. Someone needs to do some minor matrix math, so they install numpy. Numpy isn't so bad, but if installing it via conda it pulls in MKL, which sits at 171MB right now (although I have memories of it being bigger in the past). It also pulls in intel-openmp, which is 17MB.
> Someone needs to do some minor matrix math, so they install numpy
I’m just not convinced that it’s worth the pain to avoid installing these packages.
You want speedy matrix math. Why would you install some second rate package just because it has a lighter footprint on disk? I want my dependencies rock solid so I don’t have to screw with debugging them. They’re not my core business - if (when) they don’t “just work” it’s a massive time sink.
NumPy isn’t “left pad” so this argument doesn’t seem strong to me.
Because rust is paying the price to compile everything fromch scratch on a release build, you can pay a little extra to turn on link time optimization and turn of parallelism on release builds and absolutely nothing gets compiled in that you don't use, and nothing gets repeated. Also enabling symbols to be stripped can take something with tokio, clap, serde, nalgebra (matrix stuff) and still be 2-5Mb binary. That is still huge to me because I'm old, but you can get it smaller if you want to recompile std along with your other dependencies.
MKL is usually what you want if you are doing matrix math on an Intel CPU.
A better design is to make it easy you to choose or hotswap your BLAS/LAPACK implementation. E.g. OpenBLAS for AMD.
Edit: To be clear, Netlib (the reference implementation) is almost always NOT what you want. It's designed to be readable, not optimized for modern CPUs.
Others have made similar comments, but tree-shaking, symbol culling and anything else that removes dead code after its already been distributed and/or compiled is too late IMO. It's a band-aid on the problem. A useful and pragmatic band-aid today for sure, but it fundamentally bothers me that we have to spend time compiling code and then spend more time to analyze and rip it back out.
Part of the issue I have with the dependency bloat is how much effort we currently go through to download, distribute, compile, lint, typecheck, whatever 1000s of lines of code we don't want or need. I want software that allows me to build exactly as much as I need and never have to touch the things I don't want.
> Others have made similar comments, but tree-shaking, symbol culling and anything else that removes dead code after its already been distributed and/or compiled is too late IMO.
Why, in principle, wouldn't the same algorithms work before distribution?
For that matter, check out the `auditwheel` tool in the Python ecosystem.
As others have pointed out elsewhere, that only removes static dependencies. If you have code paths that are used depending on dynamic function arguments static analysis is unable to catch those.
For example, you have a function calling XML or PDF or JSON output functions depending on some output format parameter. That's three very different paths and includes, but if you don't know which values that parameter can take during runtime you will have to include all three paths, even if in reality only XML (for example) is ever used.
Or there may be higher level causes outside of any analysis, even if you managed a dynamic one. In a GUI, for example, it could be functionality only ever seen by a few with certain roles, but if there is only one app everything will have to be bundled. Similar scenarios are possible with all kinds of software, for example an analysis application that supports various input and output scenarios. It's a variation of the first example where the parameter is internal, but now it is external data not available for an analysis because it will be known only when the software is actually used.
The situation isn't quite as dire as you portray. Compilers these days can also do devirtualization. The consequent static calls can become input to tree shaking in the whole program case. While it's true that we can't solve the problem in general, there's hope for specific cases.
Way back when, I used to vendor all the libraries for a project (Java/Cpp/Python) into a mono repo and integrate building everything into the projects build files so anyone could rebuild the entire app stack with whatever compiler flags they wanted.
It worked great, but it took diligence, it also forces you to interact with your deps in ways that adding a line to a deps file does not.
One nice thing about cargo is that it builds all your code together, which means you can pass a unified set of flags to everything. The feature of building everything all the time as a whole has a bunch of downsides, many which are mentioned elsewhere, but the specific problem of not being able to build dependencies the way you want isn't one.
This is the default way of doing things in the monorepo(s) at Google.
It feels like torture until you see the benefits, and the opposite ... the tangled mess of multiple versions and giant transitive dependency chains... agony.
I would prefer to work in shops that manage their dependencies this way. It's hard to find.
I've never seen a place that does it quite like Google. Is there one? It only works if you have one product or are a giant company as it's really expensive to do.
Being able to change a dependency very deep and recompile the entire thing is just magic though. I don't know if I can ever go back from that.
It absolutely is so, or was for the 10 years I was there. I worked on Google3 (in Ads, on Google WiFi, on Fiber, and other stuff), in Chromium/Chromecast, Fiber, and on Stadia, and every single one of those repos -- all different repositories -- used vendored deps.
Yet Maven repository is still not that bloated even after 20+ years Java et al. being one of the most popular language.
Compared to Rust where my experience with protobuf lib some time ago was that there is a choice of not 1 but even 3 different libraries, one of which doesn't support services, another didn't support the syntax we had to support, and the third one was unmaintained. So out of 3 choices no single one worked.
Compared that to Maven, where you have only one official supported choice that works well and well maintained.
No, there were never several unofficial libraries, one of which eventually won the popularity contest. There was always only one official. There is some barrier to add your project there, so might be that helped.
It's even more pronounced with the main Java competitor: .Net. They look at what approach won in Java ecosystem and go all in. For example there were multiple ORM tools competing, where Microsoft adopted the most popular one. So it's even easier choice there, well supported and maintained.
This works very well until different parts of the deps tree start pulling same Foo with slightly different flags/settings. Often for wrong reasons but sometimes for right ones, and then its new kind of “fun”. Sometimes buildsystem is there to help you but sometimes you are on your own. Native languages like C++ bring special kind of joy called ODR violations to the mix…
> Its super easy to say `import foolib`, then call `foolib.do_thing()` and just start running.
It's effectively an end-run around the linker.
It used to be that you'd create a library by having each function in its own compilation unit, you'd create a ".o" file, then you'd bunch them together in a ".a" archive. When someone else is compiling their code, and they need the do_thing() function, the linker sees it's unfulfiled, and plucks it out of the foolib.a archive. For namespacing you'd probably call the functions foolib_do_thing(), etc.
However, object-orientism with a "god object" is a disease. We go in through a top-level object like "foolib" that holds pointers to all its member functions like do_thing(), do_this(), do_that(), then the only reference the other person's code has is to "foolib"... and then "foolib" brings in everything else in the library.
It's not possible for the linker to know if, for example, foolib needed the reference to do_that() just to initialise its members, and then nobody else ever needed it, so it could be eliminated, or if either foolib or the user's code will somehow need it.
> Go and Rust, for example, encourage everything for a single package/mod to go in the same file.
I can say that, at least for Go, it has excellent dead code elimination. If you don't call it, it's removed. If you even have a const feature_flag = false and have an if feature_flag { foobar() } in the code, it will eliminate foobar().
>At each level a caller might need 5% of the functionality of any given dependency. The deeper the dependency tree gets the more waste piles on. Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call, but all you did was take that one dependency to format a number.
So, what's is the compiler doing that he doesnt remove unused code?
"dependency" here I guess means something higher-level that your compiler can't make the assumption you will never use.
For example you know you will never use one of the main functions in the parsing library with one of the arguments set to "XML", because you know for sure you don't use XML in your domain (for example you have a solid project constraint that says XML is out of scope).
Unfortunately the code dealing with XML in the library is 95% of the code, and you can't tell your compiler I won't need this, I promise never to call that function with argument set to XML.
Why the compiler can't detect it will not be used? Tree shaking is well implemented in Javascript compilers, an ecosystem which extensively suffer from this problem.
It should be possible to build a dependency graph and analyze which functions might actually end up in the scope. After all the same is already done for closures.
You as the implementer might know the user will never input xml, so doc_format can't be 'xml' (you might even add some error handling if the user inputs this), but how can you communicate this to the compiler?
What you're calling "tree shaking" is more commonly called "dead code elimination" in compilers, and is one of the basic optimisations that any production compiler would implement.
A surprising amount of code might be executed in rarely-used or undocumented code paths (for example, if the DEBUG environment variable is 1 or because a plugin is enabled even if not actually used) and thus not shaken out by the compiler.
Plenty of libraries have "verbose" logging flags ship way more than assumed. I remember lots of NPM libs that require `winston` for example are runtime-configurable. Or Java libraries that require Log4J. With Rust it's getting hard to remember because everything today seems to pull the fucking kitchen sink...
And even going beyond "debug", plenty of libraries ship features that are downright unwanted by consumers.
The two famous recent examples are Heartbleed and Log4shell.
> Go and Rust, for example, encourage everything for a single package/mod to go in the same file.
Clarification: Go allows for a very simple multi-file. It’s one feature I really like, because it allows splitting otherwise coherent module into logical parts.
I probably mischaracterized this as its been a while since I did more than trivial Rust. AFAIK its not possible to depend on only a part of a module in Rust though right? (At least without an external build system)
For example, you can't split up a module into foo.rs containing `Foo` and bar.rs containing `Bar`, both in module 'mymod' in such a way that you can `use mymod::Bar and foo.rs is never built/linked.
My point is the granularity of the package/mod encourages course-grained deps, which I argue is a problem.
> not possible to depend on only a part of a module in Rust though right
yesn't, you can use feature flags similar to `#if` in C
but it's also not really a needed feature as dead code elimination will prune out all code functions, types, etc. you don't use. Non of it will end up in the produced binary.
Yeah, likewise Rust is completely fine after you say `mod foo` and have a file named foo.rs, if you also make a foo/ directory and put foo/whatever.rs and foo/something_else.rs that those are all part of the foo module.
Historically Rust wanted that foo.rs to be renamed foo/mod.rs but that's no longer idiomatic although of course it still works if you do that.
in rust crates are semantically one compilation unit (where in C oversimplified it's a .h/.c pair, and practically rustc will try to split it in some more units to speed up build time).
the reason I'm pointing this out is because many sources of "splitting a module across files" come from situations where 1 file is one compilation unit so you needed to have a way to split it (for organization) without splitting it (for compilation) in some sitation
Not just multiple files, but multiple directories. One versioned dependency (module) usually consists of dozens of directories (packages) and dozens to hundreds of files. Only newcomers from other languages create too many go.mod files when they shouldn't.
which get reinvented all the time, like in dotnet with "trimming" or in JS with "tree-shaking".
C/C++ compiler have been doing that since before dot net was a thing, same for rust which does that since it's 1.0 release (because it's done by LLVM ;) )
The reason it gets reinvented all the time is because while it's often quite straight forward in statically compiled languages it isn't for dynamic languages as finding out what actually is unused is hard (for fine grained code elimination) or at lest unreliable (pruning submodules). Even worse for scripting languages.
Which also brings use to one area where it's not out of the box, if you build .dll/.so in one build process and then use them in another. Here additional tooling is needed to prune the dynamic linked libraries. But luckily it's not a common problem to run into in Rust.
In general most code size problems in Rust aren't caused by too huge LOC of dependencies but by an overuse of monopolization. The problem of tons of LOC in dependencies is one of supply chain trust and review ability more then anything else.
> The reason it gets reinvented all the time is because while it's often quite straight forward in statically compiled languages it isn't for dynamic languages as finding out what actually is unused is hard (for fine grained code elimination) or at lest unreliable (pruning submodules). Even worse for scripting languages.
It seems to me in a strict sense the problem of eliminating dead code may be impossible for code that uses some form of eval(). For example, you could put something like eval(decrypt(<encrypted code>,key)), for a user-supplied key (or otherwise obfuscated); or simply eval(<externally supplied code>); both of which could call previously dead code. Although it seems plausible to rule out such cases. Without eval() some of the problem seems very easy otoh, like unused functions can simply be removed!
And of course there are more classical impediments, halting-problem like, which in general show that telling if a piece of code is executed is undecidable.
( Of course, we can still write conservative decisions that only cull a subset of easy to prove dead code -- halting problem is indeed decidable if you are conservative and accept "I Don't Know" as well as "Halts" / "Doesn't Halt" :) )
Yes, even without Eval, there's a ton of reflective mechanisms in JS that are technically broken by dead code elimination (and other transforms, like minification), but most JS tools make some pretty reasonable assumptions that you don't use these features. For example, minifiers assume you don't rely on specific Function.name property being preserved. Bundlers assume you don't use eval to call dead code, too.
> Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call,
It’s getting hard to take these conversations seriously with all of the hyperbole about things that don’t happen. Nobody is producing Rust binaries that hit 500MB or even 50MB from adding a couple simple dependencies.
You’re also not ending up with mountains of code that never gets called in Rust.
Even if my Rust binaries end up being 10MB instead of 1MB, it doesn’t really matter these days. It’s either going on a server platform where that amount of data is trivial or it’s going into an embedded device where the few extra megabytes aren’t really a big deal relative to all the other content that ends up on devices these days.
For truly space constrained systems there’s no-std and entire, albeit small, separate universe of packages that operate in that space.
For all the doom-saying, in Rust I haven’t encountered this excessive bloat problem some people fret about, even in projects with liberal use of dependencies.
Every time I read these threads I feel like the conversations get hijacked by the people at the intersection of “not invented here” and nostalgia for the good old days. Comments like this that yearn for the days of buying paid libraries and then picking them apart anyway really reinforce that idea. There’s also a lot of the usual disdain for async and even Rust itself throughout this comment section. Meanwhile it feels like there’s an entire other world of Rust developers who have just moved on and get work done, not caring for endless discussions about function coloring or rewriting libraries themselves to shave a few hundred kB off of their binaries.
I agree on the bloat, considering my rust projects typically don't use any shared libraries other than a libc a few Mb for a binary including hundreds of crates in dependencies (most pf which are part of rustc or cargo itself), doesn't seem so bad. I do get the asynch thing. It just isn't the right tool for most of my needs. Unless you are in the situation where you need to wait faster (for connections usually) threads are better for trying to compute faster than asynch is.
I don't think libraries are the problem, but we don't have a lot of visibility after we add a new dependency. You either take the time to look into it, or just add it and then forget about the problem (which is kind of the point of having small libraries).
It should be easy to build and deploy profiling-aware builds (PGO/BOLT) and to get good feedback around time/instructions spent per package, as well as a measure of the ratio of each library that's cold or thrown away at build time.
I agree that I don't like thinking of libraries as the problem. But they do seem to be the easiest area to point at for a lot of modern development hell. Is kind of crazy.
I'll note that it isn't just PGO/BOLT style optimizations. Largely, it is not that at all, oddly.
Instead, the problem is one of stability. In a "foundation that doesn't move and cause you to fall over" sense of the word. Consider if people made a house where every room had a different substructure under it. That, largely, seems to be the general approach we use to building software. The idea being that you can namespace a room away from other rooms and not have any care on what happens there.
This gets equally frustrating when our metrics for determining the safety of something largely discourages inaction on any dependencies. They have to add to it, or people think it is abandoned and not usable.
Note that this isn't unique to software, mind. Hardware can and does go through massive changes over the years. They have obvious limitations that slow down how rapidly they can change, of course.
> Instead, the problem is one of stability. In a "foundation that doesn't move and cause you to fall over" sense of the word. Consider if people made a house where every room had a different substructure under it. That, largely, seems to be the general approach we use to building software. The idea being that you can namespace a room away from other rooms and not have any care on what happens there.
I'm not sure what the problem is here.
Are you after pinning dependencies to be sure they didn't change? Generally I want updating dependencies to fix bugs in them.
Are you after trusting them through code review or tests? I don't think there's shortcuts for this. You shouldn't trust a library, changing or not, because old bugs and new vulnerabilities make erring on both sides risky.
On reviewing other's code, I think Rust helps a bit by being explicit and fencing unsafe code, but memory safety is not enough when a logic bug can ruin your business. You can't avoid testing if mistakes or crashes matter.
> I'll note that it isn't just PGO/BOLT style optimizations. Largely, it is not that at all, oddly.
Well, it's not required to trim code that you can prove unreachable, true. But I was thinking about trying to measure if a given library really pulls it's non-zero weight, and how much CPU is spent in it.
A library taking "too much time" for something you think can be done faster might need replacement, or swapping for a simple implementation (say the library cares about edge cases you don't face or can avoid).
It's a terrible idea because you're trying to reinvent section splitting + `--gc-sections` at link time, which rust (which the article is about) already does by default.
The article is about Rust, but I was commenting on dependencies in general.
Things like --gc-sections feels like a band-aid, a very practical and useful band-aid, but a band-aid none the less. You're building a bunch of things you don't need, then selectively throwing away parts (or selectively keeping parts).
IMO it all boils down to the granularity. The granularity of text source files, the granularity of units of distribution for libraries. It all contributes to a problem of large unwieldy dependency growth.
I don't have any great solutions here, its just observations of the general problem from the horrifying things that happen when dependencies grow uncontrolled.
As far as I'm aware, LTO completely solves this from a binary size perspective. It will optimise out anything unused. You can still get hit from a build time perspective though.
"completely solves" is a bit of an overstatement. Imagine a curl-like library that allows you to make requests by URL. You may only ever use HTTP urls, but code for all the other schemas (like HTTPS, FTP, Gopher) needs to be compiled in as well.
This is an extreme example, but the same thing happens very often at a smaller scale. Optional functionality can't always be removed statically.
That only applies when dynamic dispatch is involved and the linker can't trace the calls. For direct calls and generics(which idiomatic Rust code tends to prefer over dyn traits) LTO will prune extensively.
Depends on what is desired, in this case it would fail (through the `?`), and report it's not a valid HTTP Uri. This would be for a generic parsing library that allows for multiple schemes to be parsed each with their own parsing rules.
If you want to mix schemes you would need to be able to handle all schemes; you can either go through all variations (through the same generics) you want to test or just just accept that you need a full URI parser and lose the generic.
See, the trait system in Rust actually forced you to discover your requirements at a very core level. It is not a bug, but a feature. If you need HTTPS, then you need to include the code to do HTTPS of course. Then LTO shouldn't remove it.
If your library cannot parse FTP, either you enable that feature, add that feature, or use a different library.
I guess that depends on the implementation. If you're calling through an API that dynamically selects the protocol than I guess it wouldn't be removable.
Rust does have a feature flagging system for this kind of optional functionality though. It's not perfect, but it would work very well for something like curl protocol backends though.
That's a consequence of crufty complicated protocols and standards that require a ton of support for different transports and backward compatibility. It's hard to avoid if you want to interoperate with the whole world.
yes, it's not a issue of code size but a issue of supply chain security/reviewability
it's also not always a fair comparison, if you include tokio in LOC counting then you surely would also include V8 LOC when counting for node, or JRE for Java projects (but not JDK) etc.
And, reductio ad absurdum, you perhaps also need to count those 27 million LOC in Linux too. (Or however many LOC there are in Windows or macOS or whatever other OS is a fundamental "dependency" for your program.)
It's certainly better than in Java where LTO is simply not possible due to reflection. The more interesting question is which code effectively gets compiled so you know what has to be audited. That is, without disassembling the binary. Maybe debug information can help?
Yet it works, thanks to additional metadata, either in dynamic compiler which effectly does it in memory, throwing away execution paths with traps to redo when required, and with PGO like metadata for AOT compilation.
And since we are always wrong unless proven otherwise,
In Go, the symbol table contains enough information to figure this out. This is how https://pkg.go.dev/golang.org/x/vuln/cmd/govulncheck is able to limit vulnerabilities to those that are actually reachable in your code.
It's possible and in recent years the ecosystem has been evolving to support it much better via native-image metadata. Lots of libraries have metadata now that indicates what's accessed via reflection and the static DCE optimization keeps getting better. It can do things like propagate constants to detect more code as dead. Even large server frameworks like Micronaut or Spring Native support it now.
The other nice thing is that bytecode is easy to modify, so if you have a library that has some features you know you don't want, you can just knock it out and bank the savings.
Yes, jlink, code guard, R8/D8 on Android, if you want to stay at the bytecode level, plus all the commercial AOT compilers and the free beer ones, offer similar capabilities at the binary level.
Everywhere in this thread is debating whether LTO "completely" solves this or not, but why does this even need LTO in the first place? Dead code elimination across translation units in C++ is traditionally accomplished by something like -ffunction-sections, as well as judiciously moving function implementations to the header file (inline).
Clang also supports virtual function elimination with -fvirtual-function-elimination, which AFAIK currently requires full LTO [0]. Normally, the virtual functions can't be removed because the vtable is referencing them. It's very helpful in cutting down on bloat from our own abstractions.
LTO only gets you so far, but IMO its more kicking the can down the road.
The analogy I use is cooking a huge dinner, then throwing out everything but the one side dish you wanted. If you want just the side-dish you should be able to cook just the side-dish.
LTO gets a lot of the way there, but it won't for example help with eliminating unused enums (and associated codepaths). That happens at per-crate MIR optimisation iirc, which is prior to llvm optimisation of LTO.
The actual behavior of go seems much closer to your ideal scenario than what you attribute to it. Although it is more nuanced, so both are true. In go, a module is a collection of packages. When you go get a module, the entire module is pulled onto the host, but when you vendor only the packages you use (and i believe only the symbols used from that package, but am not certain) are vendored to your module as dependencies.
There's an interesting language called Unison, which implements part of this idea (the motivation is a bit different, though)
Functions are defined by AST structure and are effectively content addressed. Each function is then keyed by hash in a global registry where you can pull it from for reuse.
> In some cases the languages make this worse. Go and Rust, for example, encourage everything for a single package/mod to go in the same file.
What? I don't know about Go, but this certainly isn't true in Rust. Rust has great support for fine-grained imports via Cargo's ability to split up an API via crate features.
> At each level a caller might need 5% of the functionality of any given dependency.
I think that is much more of a problem in ecosystems where it is harder to add dependencies.
When it is difficult to add dependencies, you end up with large libraries that do a lot of stuff you don't need, so you only need to add a couple of dependencies. On the other hand, if dependency management is easy, you end up with a lot of smaller packages that just do one thing.
> The only real solution I can think of to deal with this long term is ultra-fine-grained symbols and dependencies. Every function, type, and other top-level language construct needs to declare the set of things it needs to run (other functions, symbols, types, etc). When you depend on that one symbol it can construct, on demand, the exact graph of symbols it needs and dump the rest for any given library. You end up with the minimal set of code for the functionality you need.
Or you have ultra-fine-grained modules, and rely on existing tree-shaking systems.... ?
If you think about it, every function already declares what it needs simply by actually using it. You know if a function needs another function because it calls it. So what exactly are you asking? That the programmer insert a list of dependent functions in a comment above every function? The compiler could do that for you. The compiler could help you and go up a level and insert the names of modules the functions belong to?
My understanding is that the existing algorithms for tree shaking (dead code elimination, etc. etc. whatever you want to call it) work exactly on that basis. But Python is too dynamic to just read the source code and determine what's used ahead of time. eval and exec exist; just about every kind of namespace is reflected as either a dictionary or an object with attributes, and most are mutable; and the import system works purely at runtime and has a dazzling array of hooks.
> The only real solution I can think of to deal with this long term is ultra-fine-grained symbols and dependencies. Every function, type, and other top-level language construct needs to declare the set of things it needs to run (other functions, symbols, types, etc). When you depend on that one symbol it can construct, on demand, the exact graph of symbols it needs and dump the rest for any given library.
That’s literally the JS module system? It’s how we do tree shaking to get those bundle sizes down.
As many others as mentioned, "tree shaking" is just a rebranded variation of dead code elimination which is a very old idea. I don't think JS does what OP is suggesting anyway, you certainly don't declare the exact dependencies of each function.
Small libraries are nice to reduce stuff, but are npm's isEven, isOdd and leftpad really the right solution? - Instead of a bunch of small libraries maintained by many individual maintainers I'd prefer a larger lib maintained by a group, where continuacy is more likely and different parts work together.
I am just a college student, so sorry if this is stupid, but we know that Rust compiler can detect unused code, variables, functions and all, as can IDE's for all languages, then why don't we just remove those parts? The unused code is just not compiled.
Mainly because in some libs some code is activated at runtime.
A lot of the bloat comes from functionality that can be activated via flags, methods that set a variable to true, environment variables, or even via configuration files.
Sure, but I'm talking about bloat in libraries that don't get LTO'd. If there are no feature flags and no plugin functionality, LTO can't do its job. There are plenty of non-core libraries like this.
OTOH it also depends on the architecture you build. If you have a local-first thick client the initial install of 800 MB is less relevant if after install you communicate on a tightly controlled (by you) p2p networking stack, but take on heavy dependencies in the UI layer to provide you e.g. infinite collaborative canvas based collaboration and diagramming.
This has been the #1 way to achieve code re-use and I am all for it. Optimize it in post where it is necessary and build things faster with tested code.
> In the 80s the idea of a library of functionality was something you paid for, and painstakingly included parts of into your size constrained environment (fit it on a floppy). You probably picked apart that library and pulled the bits you needed, integrating them into your builds to be as small as possible.
If anything, the 1980s is when the idea of fully reusable, separately-developed software components first became practical, with Objective-C and the like. In fact it's a significant success story of Rust that this sort of pervasive software componentry has now been widely adopted as part of a systems programming language.
You're talking about different 80s. On workstations and Unix mainframes, beasts like Smalltalk and Objective C roamed the Earth. On home computers, a resident relocatable driver that wasn't part of ROM was an unusual novelty.
Size issues and bloat can be solved by tree shaking which is orthogonal to granularity of the package ecosystem. It doesn't matter for server side (at least people don't care). On client side, most ecosystems have a way to do it. Dart does it. Android does it with proguard.
The more pressing issue with dependencies is supply chain risks including security. That's why larger organizations have approval processes for using anything open source. Unfortunately the new crop of open source projects in JS and even Go seem to suffer from "IDGAF about what shit code from internet I am pulling" syndrome.
Unfortunately granularity does not solve that as long as your 1000 functions come from 1000 authors on NPM.
A consideration that is often overlooked is that the waste accumulates exponentially!
If each layer of “package abstraction” is only 50% utilised, then each layer multiplies the total size by 2x over what is actually required by the end application.
Three layers — packages pulling in packages that pull their own dependencies — already gets you to 88% bloat! (Or just 12% useful code)
An example of this is the new Windows 11 calculator that can take several seconds to start because it loads junk like the Windows 10 Hello for Business account recovery helper library!
Why? Because it has currency conversion, which uses a HTTP library, which has corporate web proxy support, which needs authentication, which needs WH4B account support, which can get locked out, which needs a recovery helper UI…
…in a calculator. That you can’t launch unless you have already logged in successfully and is definitely not the “right place” for account recovery workflows to be kicked off.
But… you see… it’s just easier to package up these things and include them with a single line in the code somewhere.
Dead code elimination means binary size bloat does not follow from dependency bloat. So this point is pretty much invalid for a compiled language like Rust.
I can't remember the last time I saw someone so conclusively demonstrate they know nothing about the basics of how libraries, compilers, and linkers work.
Agreed it’s a problem and I can’t propose a solution other than something you’ve suggested which is referencing functions by their value (tldr hashing them) kinda like what Unison(?) proposes.
But I think the best defense against this problem at the moment is to be extremely defensive/protective of system dependencies. You need to not import that random library that has a 10 line function. You need to just copy that function into your codebase. Don’t just slap random tools together. Developing libraries in a maintainable and forward seeking manner is the exception not the rule. Some ecosystems exceed here, but most fail. Ruby and JS is probably one of the worst. Try upgrading a Rails 4 app to modern tooling.
So… be extremely protective of your dependencies. Very easy to accrue tech debt with a simple library installation. Libraries use libraries. It becomes a compounding problem fast.
Junior engineers seem to add packages to our core repo with reckless abandon and I have to immediately come in and ask why was this needed? Do you really want to break prod some day because you needed a way to print a list of objects as a table in your cli for dev?
I'm curious if rust has this problem. The problem I notice in npm land is many developers have no taste. Example, there's a library for globbing call glob. You'd think it would just be a function that does globbing but no, the author decided it should ALSO be a standalone commandline executable and so includes a large commandline option parser. They could have easily made a separate commandline tool that include a library that does the glob but no, this is a common and shit pattern in npm. I'd say easily 25% or more of all "your dependencies are out of date" messages are related to the argument parcing for the commandline tool in these libraries. That's just one example.
Also there's arguably design. Should a 'glob' library actually read the file system and give you filenames or should it just tell you if a string matches a glob and leave the reset to you? I think it's better design to do the later, the simplest thing. This means less dependencies and more flexibility. I don't have to hack it or add option to use my own file system (like for testing). I can use it with a change monitoring system, etc...
And, I'm sure there are tons of devs that like the glob is a "Do everything for me" library instead of a "do one specific thing" library which makes it worse because you get more "internet points" the more your library doesn't require the person using it to be a good dev.
I can't imagine it's any different in rust land, except maybe for the executable thing. There's just too many devs and all of them, including myself, don't always make the best choices.
> Should a 'glob' library actually read the file system and give you filenames
The POSIX glob function after which these things are named traverses the filesystem and matches directory entries.
The pure matching function which matches a glob pattern against a filename-like string is fnmatch.
But yes, the equivalent of fnmatch should be a separate module and that could be a dependency of glob.
Nobody should be trying to implement glob from scratch using a fnmatch-like function and directory traversal. It is not so trivial.
glob performs a traversal that is guided by the pattern. It has to break the pattern into path components.
It knows that "*/*/*" has three components and so the traversal will only go three levels deep. Also "dir/*" has a component which is a fixed match, and so it just has to open "dir" without scanning the current directory; if that fails, glob has failed.
If the double star ** is supported which matches multiple components, that's also best if it likewise integrated into glob.
If brace expansion is supported, that adds another difficulty because different branches of a brace can have different numbers of components, like {*/x,*/*/x,*/*/*/x}. To implement glob, it would greatly help us to have brace expansion as a separate function which expands the braces, producing multiple glob patterns, which we can then break into path components and traverse.
They eventually fixed it but grunt once upon a time used a glob implementation that could not short circuit on wildcards in ignore patterns. So I caught it scanning the node-modules directory and then dropping every file it found because it matched on “node_modules/**”. Builds got a lot faster when I pushed that update out.
There’s a lot of stupid ways to implement glob and only a couple of smart ones.
Well, fnmatch really does two things, it parses the pattern and then applies that to a string, so really, there should be a "ptnparse" library that handles the pattern matching that fnmatch has a dependency.
Though, thinking it through, the "ptnparse" library is responsible for patterns matching single characters and multiple characters. We should split that up into "singleptn" and "multiptn" libraries that ptnparse can take as dependencies.
Oh, and those flags that fnmatch takes makes fnmatch work in several different ways, let's decompose those into three libraries so that we only have to pull in the matcher we care about: pthmatch, nscmatch, and prdmatch. Then we can compose those libraries based on what we want in fnmatch.
This is perfect, now if we don't care about part of the fnmatch functionality, we don't have to include it!
/s
This decomposition is how we wind up with the notorious leftpad situation. Knowing when to stop decomposing is important. fnmatch is a single function that does less than most syscalls. We can probably bundle that with a few more string functions without actually costing us a ton. Glob matching at a string level probably belongs with all the other string manipulation functions in the average "strings" library.
Importantly, my suggestion that fnmatch belongs in a "strings" library does align with your suggestion that fnmatch shouldn't be locked into a "glob" library that also includes the filesystem traversal components.
> I can't imagine it's any different in [R]ust land
Taste is important; programmers with good architectural
taste tend to use languages that support them in their
endeavour (like Rust or Zig) or at least get out of the way
(C).
So I would argue the problems you list are statistically
less often the case than in certain other languages (from COBOL to JavaScript).
> There's just too many devs and all of them, including myself, don't always make the best choices.
This point you raise is important: I think an uncoordinated crowd of developers will create a "pile of crates" ("bazaar" approach, in Eric Raymond's terminology), and a single language designer with experience will create a more uniform class library ("cathedral" approach).
Personally, I wish Rust had more of a "batteries included" standard library with systematically named and namespaced
official crates (e.g. including all major data structures) - why not "stdlib::data_structures::automata::weighted_finite_state_transducer" instead of a confusing set of choices named "rustfst-ffi", "wfst", ... ?
Ideally, such a standard library should come with the language at release. But the good news is it could still be devised later, because the Rust language designers were smart enough to build versioning with full backwards compatibility (but not technical debt) into the language itself.
My wish for Rust 2030 would be such a stdlib (it could even be implemented using the bazaar of present-day crates, as long as that is hidden from us).
213M downloads, depends on zero external crates, one source file (a third of which is devoted to unit tests), and developed by the rust-lang organization itself (along with a lot of crates, which is something that people tend to miss in this discussion).
The one that shows up first, which is to say, the one with 200 million downloads, which is to say, the one whose name is the exact match for the search query.
That's much more a statement about the search function on crates.io than it is the number of glob crates. I think if you have the standard glob crate as a dependency you show up in that search.
> Also this crate is from official rust lang repo, so much less prone to individualistic misbehaving.
To reiterate, lots of things that people in this thread are asking the language to provide are in fact provided by the rust-lang organization: regex, serde, etc. The goalposts are retreating over the horizon.
Rust's primary sin here is that it makes dependency usage transparent to the end-user. Nobody wants to think about how many libraries they depend upon and how many faceless people it takes to maintain those libraries, so they're uncomfortable when Rust shows you. This isn't a Rust problem, it's a software complexity problem.
I think the parent was suggesting comparing and contrasting the glob dependency in rust, and npm. The one off isn't useful, but picking ten random, but heavily used packages probably is. The parent didn't really mention what the node version looked like though.
The npm glob package has 6 dependencies (those dependencies have 3+ dependencies, those sub dependencies have 6+ dependencies, ...)
As you point out the rust crate is from the official repo, so while it's not part of the standard library, it is maintained by the language maintenance organization.
Maybe that could make it a bad example, but the npm one is maintained by the inventor of npm, and describes him self as "I wrote npm and a pretty considerable portion of other node related JavaScript that you might use.", so I would say that makes it a great example because the people who I would expect care the most about the language are the package maintainers of these packages, and are (hopefully) implementing what they think are the best practices for the languages, and the eco-systems.
Finding a single library that avoids the problem is pretty useless. You can find great libraries in Node as well but everyone would agree that Node has a dependency problem.
And yet it's telling that, when the author mused about library quality and unknowingly suggested an arbitrary library as an example, the Rust version turned out to be high quality.
Not really, there are plenty of large libraries today that were designed by complete boneheads. Granted, you only notice if you know that domain very well.
> Also there's arguably design. Should a 'glob' library actually read the file system and give you filenames or should it just tell you if a string matches a glob and leave the reset to you?
What you’re describing regarding glob is not lack of taste, it’s an architectural “bug”.
Taste is what Steve Jobs was referring to when he said Microsoft had none. In software it’s defined by a humane, pleasant design that almost(?) anybody can appreciate.
Programming languages cannot be tasteful, because they require time and effort to learn and understand. Python has some degree elegance and Golang’s simplicity has a certain je ne sais quoi… but they’re not really fitting the definition.
Still, some technologies such as git, Linux or Rust stand out as particularly obscure even for the average developer, not just average human.
> The problem I notice in npm land is many developers have no taste.
Programming is not the same as hanging out some hoity-toity art gallery. If someone critiqued my software dev by saying I had "no taste", I'd cringe so hard I'd turn into a black hole.
I know this is hackernews, but this reeks of self-importance.
Engineering is a form of art where the engineer makes many decisions, large and small, where optimality cannot be proven. Taste most certainly plays a role, and there are engineering products that clearly show good or poor taste.
Unfortunately this particular art form requires fluency in mathematics and the sciences/computers, so it’s very inaccessible.
Imagine if a carpenter or house builder was shitting out slop that had no taste. And then laughed at people who pointed it out. Would you hire them to build something for you?
Actually, the problem with SE culture is people think they're way smarter than they really are simply because they grew up being called a genius by knowing how to turn a computer on and off.
> Imagine if a carpenter or house builder was shitting out slop that had no taste
What are you even talking about? How does this even remotely relate to software development? Are you telling me a function that adds 2 numbers has "taste"?
Random remark: I've noticed the quality of rust libraries too. Which made me really surprised to see the overengineered mess that is the async-openai crate.
how can one take an api as simple as openai's one, and turn it to this steaming pile of manure ? in the end, i used reqwest and created my queries manually. I guess that's what everyone does...
The kinds of people who think OpenAI's tech is worth touching with a bargepole are generally not the kinds of people who develop and maintain high-quality Rust libraries.
i get it, but openai being the hotest stuff happening in software for the past.. x years , i would have assumed there was some kind of official client correctly maintained for rust.
I was a bit shocked to be honest.
edit : i originally misread your comment. OpenAI is an important tech, no matter what you think of the company itself. Being able to easily interface with their api is important.
A true enough statement, but "Rust" is unnecessarily specific. Dependencies are getting scary in general. Supply chain attacks are no longer hypothetical, they're here and have been for a while.
If I were designing a new language I think I'd be very interested in putting some sort of capability system in so I can confine entire library trees safely, and libraries can volunteer somehow what capabilities they need/offer. I think it would need to be a new language if for no other reason than ecosystems will need to be written with the concept in them from the beginning.
For instance, consider an "image loading library". In most modern languages such libraries almost invariably support loading images from a file, directly, for convenience if nothing else. In a language that supported this concept of capabilities it would be necessary to support loading them from a stream, so either the image library would need you to supply it a stream unconditionally, or if the capability support is more rich, you could say "I don't want you to be able to load files" in your manifest or something and the compiler would block the "LoadFromFile(filename)" function at compile time. Multiply that out over an entire ecosystem and I think this would be hard to retrofit. It's hugely backwards incompatible if it is done correctly, it would be a de facto fork of the entire ecosystem.
I honestly don't see any other solution to this in the long term, except to create a world where the vast majority of libraries become untargetable in supply chain attacks because they can't open sockets or read files and are thus useless to attackers, and we can reduce our attack surface to just the libraries that truly need the deep access. And I think if a language came out with this design, you'd be surprised at how few things need the dangerous permissions.
Even a culture of minimizing dependencies is just delaying the inevitable. We've been seeing Go packages getting supply-chain-attacked and it getting into people's real code bases, and that community is about as hostile to large dependency trees as any can be and still function. It's not good enough.
In WUFFS most programs are impossible. Their "Hello, world" doesn't print hello world because it literally can't do that. It doesn't even have a string type, and it has no idea how to do I/O so that's both elements of the task ruled out. It can however, Wrangle Untrusted File Formats Safely which is its sole purpose.
I believe there should be more special purpose languages like this, as opposed to the General Purpose languages most of us learn. If your work needs six, sixteen or sixty WUFFS libraries to load different image formats, that's all fine because categorically they don't do anything outside their box. Yet, they're extremely fast because since they can't do anything bad by definition they don't need those routine "Better not do anything bad" checks you'd write in a language like C or the compiler would add in a language like Rust, and because they vectorize very nicely.
I don't think so. Software is maybe the only "engineering" discipline where it is considered okay to use mainstream tools incorrectly and then blame the tools.
Maybe we need a stronger culture of Sans-IO dependencies in general. To the point of pointing out and criticising like it happens with bad practices and dark patterns. A new lib (which shouldn't be used its own file access code) is announced in HN, and the first comment: "why do you do your own IO?"
Edit - note it's just tongue in cheek. Obviously libraries being developed against the public approval wouldn't be much of a good metric. Although I do agree that a bit more common culture of the Sans-IO principles would be a good thing.
I don't think retrofitting existing languages/ecosystems is necessarily a lost cause. Static enforcement requires rewrites, but runtime enforcement gets you most of the benefit at a much lower cost.
As long as all library code is compiled/run from source, a compiler/runtime can replace system calls with wrappers that check caller-specific permissions, and it can refuse to compile or insert runtime panics if the language's escape hatches would be used. It can be as safe as the language is safe, so long as you're ok with panics when the rules are broken.
It'd take some work to document and distribute capability profiles for libraries that don't care to support it, but a similar effort was proven possible with TypeScript.
I actually started working on a tool like that for fun, at each syscall it would walk back up the stack and check which shared object a function was from and compare that to a policy until it found something explicitly allowed or denied. I don't think it would necessarily be bulletproof enough to trust fully but it was fun to write.
I love this idea. There is some reminiscence of this in Rust, but it's opt in and based on convention, and only for `unsafe` code. Specifically, there's a trend of libraries using `#![deny(unsafe_code)]` (which will cause a compilation error if there is any `unsafe` code in the current crate), and then advertising this to their users. But there's no enforcement, and the library can still add `#[allow(unsafe_Code)]` to specific functions.
Perhaps a capability system could work like the current "feature" flags, but for the standard library, which would mean they could be computed transitively.
I love this idea and I hope I get to work on it someday. I've wanted this ever since I was a starry-eyed teenager on IRC listening to Darius Bacon explain his capability-based OS idea, aptly called "Vapor".
I've thought about this (albeit not for that long) and it seems like you'd need a non-trivial revamp of how we communicate with the operating system. For instance, allowing a library to "read from a stream" sounds safe until you realize they might be using the same syscalls as reading from a file!
That's one hell of a task.
First question is how fine-grained your capability system will be. Both in terms of capabilities and who they are granted for.
Not fine-grained enough and everything will need everything, e.g. access to various clocks could be used to DoS you or as a side channel attack. Unsafe memory access might speed up your image parsing but kills all safety.
Similar problems with scope. If per dependency, forces library authors to remove useful functionality or break up their library into tiny pieces. If per function and module you'll have a hard time auditing it all.
Lastly, it's a huge burden on devs to accurately communicate why their library/function needs a specific capability.
We know from JavaScript engines, containerization and WASM runtimes what's actually required for running untrusted code. The overhead is just to large to do it for each function call.
I don't think you need to get very complex to design a language that protects libraries from having implicit system access. If the only place that can import system APIs is in the entry program, then by design libraries need to use dependency injection to facilitate explicit passing of capabilities.
One can take just about any existing language and add this constraint, the problem however is it would break the existing ecosystem of libraries.
Yes, there is a sense in which Haskell's "effect systems" are "capability systems". My effect system, Bluefin, models capabilities as values that you explicitly pass around. You can't do I/O unless you have the "IOE" capability, for example.
Is there anything in existence which has a version of this idea? It makes a ton of sense to me, but you are right that it would be practically impossible to do in a current language.
Yes, but you can't enforce this at the language level if your objective is security (at least not for natively-compiled languages). You need OS-level support for capabilities, which some OSes do provide (SeL4, Fuchsia). But if you're in a VM rather than native code then you can enforce capabilities, which is what Wasm does with WASI.
Austral is a really cool experiment and I love how much effort was put into the spec which you've linked to. It explains the need for capabilities and linear types, and how they interact, really well.
Doesn't Haskell do this to some degree with the IO monad? Functions that are not supposed to do IO directly simply have a more specific type signature, like taking in a stream and returning a buffer for example.
Interesting. I hadn't seen it yet. I'll check out how fine-grained it really is. My first concern would (naturally) be network calls, but calling a local service should ideally is distinguishable from calling some address that does not originate in the top level.
If anyone ever check this thread: it works well. Use the json output, and it'll show the call path for each "capability" it detects (network, arbitrary code execution, ...). I use this on the output to organize it into a spreadsheet and scan quickly:
TypeScript ecosystem supports this! An environment without e.g. file operations will simply miss classes that are needed for it, and your compilation will fail.
This is just a modern problem in all software development, regardless of language. We are doing more complex things, we have a much bigger library of existing code to draw from and there are many reasons to use it. Ultimately a dependency is untrusted code, and there's a long road to go in hardening entire systems to make running arbitrary dependencies safe (if its even possible).
In the absence of a technical solution, all others basically involve someone else having to audit and constantly maintain all that code and social/legal systems of trust. If it was pulled into Rust stdlib, that team would be stuck handling it, and making changes to any of that code becomes more difficult.
I'd argue that the severity varies between languages, despite the core problem being universal. Languages with comprehensive standard libraries have an advantage over those with minimal built-in functionality, where people rely on external dependencies even for the most basic things (e.g. see Java/.NET vs JS/Node). Lightweight is not always better.
> Languages with comprehensive standard libraries have an advantage
I don't see the advantage. Just a different axis of disadvantage. Take python for example. It has a crazy big standard library full of stuff I will never use. Some people want C++ to go in that direction too -- even though developers are fully capable of rolling their own. Similar problem with kitchen-sink libraries like Qt. "batteries included" languages lead to higher maintenance burden for the core team, and hence various costs that all users pay: dollars, slow evolution, design overhead, use of lowest common denominator non-specialised implementations, loss of core mission focus, etc.
It's a tradeoff. Those languages also have a very difficult time evolving anything in that standard library because the entire ecosystem relies on it and expects non-breaking changes. I think Rust gets sort of best of both worlds because dependencies are so easy to install it's almost as good as native, but there's a diversity of options and design choices, easy evolution and winners naturally emerge - these become as high quality as a stdlib component because they attract people/money to work on them but with more flexibility to change or be replaced
> If it was pulled into Rust stdlib, that team would be stuck handling it, and making changes to any of that code becomes more difficult.
I think Rust really needs to do more of this. I work with both Go and Rust daily at work, Go has its library game down -- the standard library is fantastic. With Rust it's really painful to find the right library and keep up for a lot of simple things (web, tls, x509, base64 encoding, heck even generating random numbers.)
I disagree, as I see it Rust's core-lib should be to interact with abstract features (intrinsics, registers, memory, borrow-checker, etc), and std-lib should be to interact with OS features (net, io, threads). Anything else is what Rust excels at implementing, and putting them into stdlib would restrict the adoption of different implementations.
For example there are currently 3, QUIC (HTTP/3) implementations for rust: Quiche (Cloudflare), Quinn, S2N-QUIC (AWS). They are all spec compliant, but may use different SSL & I/O backends and support different options. 2 of them support C/C++ bindings. 2 are async, 1 is sync.
Having QUIC integrated into the stdlib wouuld means that all these choices would be made beforehand and be stuck in place permanently, and likely no bindings for other languages would be possible.
Gilad Bracha has a really interesting approach to sandboxing third party libraries: Remove imports, and do everything with dependency injection. That way if you never inject say the IO subsystem, the third party code won't be able to break out. And there's no overhead, since it's all based on capabilities.
Even cooler, if you want to only expose read operations, you can wrap the IO library in another library that only exposes certain commands (or custom filtering, etc).
EDIT: I should say this doesn't work with systems programming, since there's always unsafe or UB code.
Maybe we should have a way to run every single library we use in an isolated environment and have a structure like QubesOS. Your main code is dom0 and you can create bunch of TemplateVMs which are your libraries and then create AppVMs for using those libraries. Use network namespaces for communicating between these processes. For sensitive workloads (finance, healthcare, etc), it makes sense to deploy something like that
Regardless of language, really? I highly doubt that, you don't generally see such problems with C or even C++ because dependencies are more cumbersome to add, especially in a way that's cross-platform.
With C++ it's hilarious because the C++ community is so allergic to proper dependency management and also so desperate for stuff from third party libraries that the committee spends large amounts of its time basically doing dependency management for the community by baking in large features you'd ordinarily take as a dependency into the mandatory standard library.
I'm sure I'll miss some, but IIRC C++ 26 is getting the entire BLAS, two distinct delayed reclamation systems and all of the accompanying infrastructure, new container types, and a very complicated universal system of units.
All of these things are cool, but it's doubtful whether any of them could make sense in a standard library, however for C++ programers that's the easiest way to use them...
It's bedlam in there and of course the same C++ programmers who claim to be "worried" that maybe somebody hid something awful in Rust's crates.io are magically unconcerned that copy-pasting tens of millions of lines of untested code from a third party into absolutely every C++ program to be written in the future could be a bad idea.
> copy-pasting tens of millions of lines of untested code from a third party into absolutely every C++ program to be written in the future could be a bad idea.
Is it really that bad? (By my count, as a point of reference, the Python 3.13 standard library is just under 900k lines for the .py files.)
If something is in the standard library, then it’s written and vetted by the standard library provider, not by a random third party like you make it sound.
Maintainers of all open source standard libraries are effectively "random third parties". With heavily used ecosystem dependencies (such as Tokio, but also swaths of small libraries, such as `futures` or `regex`), the number of people who have looked at the code and battle-tested it is also huge.
On crates.io, a good heuristic is to look at two numbers: the number of dependents and the number of downloads. If both are high, it's _probably_ fine. Otherwise, I'll manually audit the code.
That's not a complete solution, especially not if you're worried about this from a security perspective, but it's a good approximation if you're worried about the general quality of your dependencies.
> it’s written and vetted by the standard library provider, not by a random third party
All three modern C++ standard libraries are of course Free Software. They are respectively the GNU libstdc++, Clang's libc++ and the Microsoft STL. Because it's a huge sprawling library, you quickly leave the expertise of the paid maintainers and you're into code that some volunteer wrote for them and says it's good. Sounds like random third parties to me.
Now, I'm sure that Stephan T. Lavavej (the Microsoft employee who looks after the STL, yes, nominative determinism) is a smart and attentive maintainer, and so if you provide a contribution with a function named "_Upload_admin_creds_to_drop_box" he's not going to apply that but equally Stephen isn't inhumanly good, so subtle tricks might well get past him. Similar thoughts apply to the GNU and Clang maintainers who don't have funny names.
One Stephan T. Lavavej is worth more than 1000 random github rustaceans, some of which will be bots, AIs, rank amateurs, bought or North Korean spies. Any of the libraries has one or more Stephans.
Having paid maintainers, code review, test suites, strict contribution guidelines, etc is state of the art for open source software that some transitive crate dependency can only dream to achieve.
Because most dependencies are either manually installed by the user, or are dynamic libraries that are provided and audited by the distro maintainers. The dependencies are there, they're just harder to see - https://wiki.alopex.li/LetsBeRealAboutDependencies
Sure, there are various dependencies, but it's nothing like "cargo install crate-name". Cargo makes it so effortless to joink the dumbest dependency for the simplest thing.
On the other hand, C/C++ makes it attractive to reinvent the wheel, or vendor the dependency instead. Rather than a single well-tested implementation in the ecosystem for something like sha256, you end up with every application having its own slightly-different, mostly untested, and essentially unmaintained version.
Applications still need the functionality. The need doesn't magically disappear when installing dependencies is a pain. If a crate has a bug, the entire ecosystem can trivially get the fixed version. If the Stackoverflow snippet a C app is vendoring has a bug, that fix is never getting in the app.
That does not help you if the bug is one of many unmaintained crates and never noticed. Linux distributions aim to make sure that C application dynamically link to the right libraries instead of vendoring the code. Then the library can be updated once. IMHO this is the only reasonable approach.
> Sure, there are various dependencies, but it's nothing like "cargo install crate-name".
You don't install a Rust crate to use it. We have enough people in this thread trying to authoritatively talk about Rust without having any experience with it, please don't bother leaving a comment if you're just going to argue from ignorance.
Sure, despite all the hate it gets, except for IDE project files, it is the best experience in C and C++ build tools since forever, including IDE integration just like those project files.
I thought the whole UNIX mentality was worse is better.
No build tool is without issues, my pain points with cargo, are always compiling from source, build caching requires additional work to setup, as soon as it is more than pure Rust, we get a build.rs file that can get quite creative.
I also don't understand the CMake hate. Modern CMake (3.14+) is just around 10 lines to build basic sources/libraries/executables. And you can either use CMake FetchContent or use CPM https://github.com/cpm-cmake/CPM.cmake to fetch dependencies. No third-party tool like vcpkg or conan is needed.
I think CMake is the perfect balance. You need to write few lines and think about few things before adding a dependency, but usually nothing too crazy. It might not work the first try but that's okay.
Yes, but a lot of the complexity is unnecessary bloat. Almost every project I've ever seen or worked on was full of unnecessary complexity. People naturally tend to over-complicate things, all the programming books, including software design books focus on unimportant aspects and miss all the important ones. It's incredibly frustrating.
Yet, if someone were to write a book which explained things properly (probably a 3000 word article would suffice to turn anyone into a 10x dev), nobody would buy it. This industry is cooked.
I think that https://blessed.rs does a pretty good job of providing recommendations for things that probably can't be crammed into the standard library, but which you'll almost certainly end up needing at one point or another. I honestly like that system a lot, it makes it so that the only packages you need to worry much about are usually doing something rather specific.
It lets you track what packages you "trust". Then you can choose to transitively trust the packages trusted by entities you trust.
This lets you have a policy like "importing a new 3rd party package requires a signoff from our dependency tzar. But, packages that Google claim to have carefully reviewed are fine".
You can also export varying definitions of "trust". E.g. Google exports statements like:
- "this package has unsafe code, one of our unsafe experts audited it and thinks it looks OK"
- "this package doesn't do any crypto"
- "this is a crypto library, one of our crypto experts audited it and thinks it looks ok"
Basically it's a slightly more formal and detailed version of blessed.rs where you can easily identify all the "it's not stdlib, but, it's kinda stdlib" stuff and make it easily available to your team without going full YOLO mode.
It can also give you a "semi-YOLO" approach, it supports rationales like "this package is owned by a tokio maintainer, those folks know what they're doing, it's probably fine". I think this is a nice balance for personal projects.
Cargo makes it so simple to add tons of dependencies that it is really hard not to do it. But that does not stop here: even if I try to be careful with adding dependencies, a couple dependencies are likely to pull tens of transitive dependencies each.
"Then don't depend on them", you say. Sure, but that means I won't write my project, because I won't write those things from scratch. I could probably audit the dependency (if it wasn't pulling 50 packages itself), but I can't reasonably write it myself.
It is different with C++: I can often find dependencies that don't pull tens of transitive dependencies in C++. Maybe because it's harder to add dependencies, maybe because the ecosystem is more mature, I don't know.
But it feels like the philosophy in Rust is to pull many small packages, so it doesn't seem like it will change. And that's a pity, because I like Rust-the-language better than C++-the-language. It just feels like I trade "it's not memory-safe" for "you have to pull tons of random code from the Internet".
I think it makes a good point that some of the difference here is just perception due to dependencies in C/C++ being less immediately visible since they're dynamically loaded. To some degree that is a plus though as you likely trust the maintainers of your OS distribution to provide stable, supported libraries.
As other commenters have said, perhaps this is an area where the Rust maintainers could provide some kind of extended standard library where they don't guarantee backwards compatibility forever, but do provide guarantees about ongoing fixes for security issues.
> I think it makes a good point that some of the difference here is just perception due to dependencies in C/C++ being less immediately visible since they're dynamically loaded.
The point wasn't so much about the loading mechanism, but about the fact that the system (especially on Linux) provides them for you; a good amount come pre-installed, and the rest go through a system package manager so you don't have to worry about the language failing to have a good package system.
> some of the difference here is just perception due to dependencies in C/C++ being less immediately visible since they're dynamically loaded.
Not in my case. I manually compile all the dependencies (either because I need to cross-compile, or because I may need to patch them, etc). So I clearly see all the transitive dependencies I need in C++. And I need a lot less than in Rust, by a long shot.
Part of the rust dependency issue is that the compiler only multithreads at the crate level currently (slowly being improved on nightly, but there's still some bugs before they can roll out the parallel compiler), so most libraries split themselves up into a ton of small crates because otherwise they just take too long to compile.
edit: Also, `cargo-vet` is useful for distributed auditing of crates. There's also `cargo-crev`, but afaik it doesn't have buy in from the megacorps like cargo-vet and last I checked didn't have as many/as consistent reviews.
It can do. Additionally, because each part is now smaller it's now easier to ensure that each part, in isolation, does what it says on the tin. It also means that other projects can reuse the parts. An example of the last point would be the Regex crate.
Regex is split into subcrates, one of which is regex-syntax: the parser. But that crate is also a dependency of over 150 other crates, including lalrpop, proptest, treesitter, and polars. So other projects have benefited from Regex being split up.
I take bit less unstable dependencies over the total mess of C++ dependencies with CMake, shared libraries, version conflicts etc any time. There's probably also a bit of an illusion about C++ transitive dependencies due to them usually being precompiled (because compiling them is such pain).
The whole pkgconfig, cmake, autotools etc ecosystem is insane compared to how Rust and Go do things.
It's part of the reason why software distribution on Linux has been pushed to using containers, removing the point of having shared libraries. I think Google with it's C++ replacement (Carbon) plans on doing it's own system.
From my point of view, the issue stems from developers wanting to control distribution. Fine if it's for your own usage, not really if you're planning for others to use it. You will find the most convoluted build system just because they have a pet platform they want to specially support making it hell to do anything on others.
It could be better, but the current solutions (npm, go, python,...) favor only the developers, not the maintainers and packagers.
There's examples of maintainers/packagers effectively sabotaging other peoples projects when making packages for distros, whether that's shipping them broken, ancient versions etc.
e.g. Bottles, WebkitGTK (distros liked keeping this one held back even though doing so is a security risk)
IMHO it shouldn't be the responsibility of the OS vendor to package third party applications.
Distro maintainers/packagers are who keep the current software stacks running. It's rather amazing how they manage to keep the billion or so lines of separately written code working in unison.
That said, the labor needed to keep the stuff together could be reduced a lot by the more ergonomical and universal packaging and distribution methods like Cargo (and, dare I say, npm). I think some kind of a better bridge between developers and distros could be found here.
> > I think some kind of a better bridge between developers and distros could be found here.
Every tom dick and harry is making their own distro these days (even if they're just respins of Arch with Calamares and some questionable theme settings), why add more work onto developers?
We have things like Flatpak and Docker now that let application developers ignore the distros and stop them breaking things, unless you're Ubuntu whom is constantly begging to get purchased by Microsoft.
> I think some kind of a better bridge between developers and distros could be found here.
I don’t think there’s a need to do so. Only discipline is needed, by using stable and mature dependencies, and documenting the building process. And maybe some guides/scripts for the most popular distros.
> It's part of the reason why software distribution on Linux has been pushed to using containers
My understanding of people distributing their software in containers is that they can't be arsed to learn how to do it properly. They would install their software and ship the entire computer if that was cost effective.
What needs to be "learned properly" is sadly a huge pile of incoherent legacy cruft that ideally wouldn't be there at all.
This is not to denigrate the huge and critical effort that makes current computing possible, and that is likely unavoidable in the real world. But software distribution needs to evolve.
> What needs to be "learned properly" is sadly a huge pile of incoherent legacy cruft
I don't find it incoherent, nor huge. Unless the bar for "huge" is "anything that requires more attention than asking an LLM and copy-pasting its answer", maybe.
It's not a case of 'learning to do it properly', it's a case of a huge amount of effort to deal with arbitrary differences between distros, as well as fighting with distro policies that would rather ship the software with known bugs than allow two versions of a library to exist on the system.
> it's a case of a huge amount of effort to deal with arbitrary differences between distros
That is not at all a problem for open source stuff: build your project correctly, and let distros do their job. Still, open source projects are too often doing it wrong, because nobody can be arsed to learn.
> as well as fighting with distro policies that would rather ship the software with known bugs than allow two versions of a library to exist on the system.
Sounds like if you need this, you're doing it wrong. If it's a major update (e.g. 2.3.1 to 3.0.0), it's totally possible to have a new package (say `python2` and `python3`). If your users need two versions of a library that are in the same major version (e.g. 2.3.1 and 2.5.4), then you as a developer are doing it wrong. No need to fight, just learn to do it properly.
> the philosophy in Rust is to pull many small package
I'm not sure it's a philosophy, more a pragmatic consideration for compilation speeds. Anyone who's done a non-trivial amount of Rust knows that moment when the project gets too big and needs to split into separate crates. It's kinda sad that you can't organize code according to proper abstractions, many times I feel forced to refactor for compiler performance.
My point is that if, in the language, everybody is incentivise to use fewer dependencies, then a random library that I would not write myself (because it is an entire project in itself) would have fewer dependencies. Because it is not the case, either I take that library and accept its transitive dependencies, or I don't have a library at all.
In Rust, I'm sometimes actually tempted to wrap a C/C++ library (and its few dependencies) instead of getting the Rust alternative (and its gazillion dependencies).
And you need to think a bit about that (probably not very hard), to help you decide whether I'm irrational or whether you may not have totally understood my point.
I have been wasting 6 hours yesterday on getting the bullet examples to compile outside of bullet itself with no success. It's more likely that a lot of software simply doesn't get written because C++ and CMake are a pain in the ass.
I find CMake pretty easy, and I only use a few core features from it. Usually the pain comes from completely wrong setups by people who didn't learn the basic. But it's true of everything, I think.
I feel like leftpad has given package managers a very bad name. I understand the OP's hesitation, but it feels a little ridiculous to me.
tokio is a work-stealing, asynchronous runtime. This is a feature that would be an entire language. Does OP consider it reasonable to audit the entire Go language? or the V8 engine for Node? v8 is ~10x more lines than tokio.
If Cloudflare uses Node, would you expect Cloudflare to audit v8 quarterly?
How does one approach doing so? Do you open the main.rs file (or whichever is the entry point) and start reading code and referenced functions on a breadth-first search (BFS) manner?
It'll do that if there isn't a single version that meets both requirements. Which is a great thing, because most other languages will just fail the build in that case (well, there are still cases where it won't even work in rust, if types from those sub-dependencies are passed in between the two closer dependencies)
npm does this (which causes [caused?] the node_modules directory to have a megazillion of files usually, but sometimes "hoisting" common dependencies helps, and there's Yarn's PnP [which hooks into Node's require() and keeps packages as ZIPs], and pnpm uses symlinks/hardlinks)
In the past (not in Rust, but other languages), for important systems, I've instituted policies of minimizing dependencies from these language-specific package repositories, and for the ones you do use, having to copy it to our own repos and audit each update before use.
But that's not practical for all situations. For example, Web frontend developer culture might be the worst environment, to the point you often can't get many things done in feasible time, if you don't adopt the same reckless practices.
I'm also seeing it now with the cargo-culting of opaque self-hosted AI tools and models. For learning and experimenting, I'd spend more time sufficiently compartmentalizing an individual tool than with using it.
This weekend, I'm dusting off my Rust skills, for a small open source employability project (so I can't invest in expensive dependency management on this one). The main thing thing bothering me isn't allocation management, but the sinking feeling when I watch the cast-of-thousands explosion of transitive dependencies for the UI and async libraries that I want to use. It's only a matter of time before one of those is compromised, if not already, and one is all it takes.
Anything else will get abused in the name of expediency and just-this-one-time.
Also, the process for adding a crate/gem/module/library needs to be the same as anything else: license review, code review, subscription to the appropriate mailing list or other announce channel, and assignment of responsibility. All of these except code review can be really, really fast once you have the process going.
All problems are, at least in part, dependency chain management problems.
I agree that some amount of friction when including third party dependencies is a vital thing to push people to consider the value versus cost of dependencies (and license review, code review, channel subscriptions are all incredibily important and almost always overlooked), however how should this work for transitive dependendencies? And the dependencies of _those_ dependencies?
The dependency trees for most interpreted or source-distributed languages are ridiculous, and review of even a few of those seems practically impossible in a lot of development environments.
True, hence we can go next level and also deal with limited accounts for developers, and I can tell you most folks on HN would hate to work in such corporate environments.
I'd leave. If I have to beg IT security every other day for something,it's just not worth it. I was in that situation once before and it was endlessly frustrating. It also wasn't even their choice, the CEO dictated it after attending some security talk once upon a time, and then instantly "you can't trust anyone or anything". You can trust my stay there will be short though :)
No doubt, although this is always a job market situation, in many places around the globe being a developer isn't much different from any other office job, where many folks have to be happy to have a job in first place.
There are some voices trying to address this security risk (e.g. the proponents of this new RFC: https://github.com/rust-lang/rfcs/pull/3810). However, for some reason (probably culture) there isn't much momentum yet to change the status quo.
> isn't much momentum yet to change the status quo.
it's complex problem with tons of partial solutions which each have tons of ways to implement them with often their no being a clear winner
i.e. it's the kind of hard to solve by consensus problem
e.g. the idea of a extended standard library is old (around since the beginning of rust) but for years it was believed it's probably the best to make it a separate independent project/library for various reason. One being that the saying "the standard library is the place where code goes to die" has been quite true for multiple ecosystems (most noticeably python)
as a side note ESL wouldn't reduce the LOC count it would increase it as long as you fully measure LOCs and not "skip" over some dependencies
The rust RFC process has, frankly, become somewhat of a CF.
There's literally 1000s of RFCs for rust with only a small handful that are integrated. Having this forest, IMO, makes it hard for any given proposal to really stand out. Further, it makes duplicate effort almost inevitable.
Rust's RFC process is effectively a dead letter box for most.
I think they can constitute committee for RFC review process(in case there is none today) and based on recommendation multiple domain specific teams/ groups can be created to review RFCs in timely manner.
We need a term like “Mature” or similar for dependencies that are done. Mature dependencies have two characteristics:
1. Well defined scope
2. Infrequent changes
Nomad has many of these (msgpack, envparse, cli, etc). These dependencies go years without changing so the dependency management burden rapidly approaches zero. This is an especially useful property for “leaf” dependencies with no dependencies of their own.
I wish libraries could advertise their intent to be Mature. I’d choose a Mature protobuf library over one that constantly tweaked its ergonomics and performance. Continual iterative improvement is often a boon, but sometimes it’s not worth the cost.
Java did this sometimes by essentially adding slightly tidied up versions of whatever was the de-facto standard to the standard library. Java 1.3 didn't have regexes but most people were using the same apache commons thing, so java 1.4 added regexes that looked exactly like that. Java's date handling was a pain so people mostly used joda-date; a later java version added something that mostly works like jodadate. Etc.
It is an easy way to get a somewhat OK standard library as the things you add became popular on their own merits at some point.
Once added, the lowest friction path is to just use the standard library; and as it is the standard library you have a slightly better hope someone will care to maintain it. You can still build a better one if needed for your use-case, but the batteries are included for basic usage
I have a lot of sympathy for this viewpoint, but I also ask that we try to remind ourselves. We are asking for professionalism from hobby projects.
If you want a mature protobuf implementation you should probably buy one. Expecting some guy/gal on the internet to maintain one for your for free seems ill advised.
> I have a lot of sympathy for this viewpoint, but I also ask that we try to remind ourselves. We are asking for professionalism from hobby projects.
Nobody is asking for professional quality standards from hobby projects. At best, they are asking for hobby projects to advertise themselves as such, and not as "this is a library for [x] that you can use in your stuff with the expectations of [maintenance/performance/compatibility/etc.]."
Resume-driven development seems to cause people to oversell their hobby projects as software that is ready to have external users.
> If you want a mature protobuf implementation you should probably buy one
No software is ever developed this way. For some reason, libraries are always free. Approximately nobody will buy paid libraries.
> For some reason, libraries are always free. Approximately nobody will buy paid libraries.
I suspect this is in no small part because figuring out a licensing (edit: pricing!) model that is both appealing to consumers and sustainable for authors is damn near impossible.
> At best, they are asking for hobby projects to advertise themselves as such
That's also work. You don't get to ask the hobby programmer to do your work of vetting serious/maintained projects for you. As the professional with a job, you have to do that. If some rando on GitHub writes in their readme that it's maintained, but lies. You're the idiot for believing him. He's probably 12 years old, and you're supposedly a professional.
> No software is ever developed this way.
That's just inaccurate. In my day job we pay for at least 3-4 3rd party libraries that we either have support contracts on or that were developed for us along with a support contract. Besides those there's also the myriad of software products, databases, editors, Prometheus, grafana, that we pay for.
Software people really underestimate how much business guys are willing to pay for having somebody to call. It's not "infinitely scalable" in the way VC's love, but it's definitely a huge business opportunity.
To add to this, in the gamedev space there are a bunch of middleware libraries that are commonly paid for: fmod/wwise, multiplayer networking sdks, etc.
Thinking about this a bit more, it seems that the reason there isn't a good way to sell licenses to software libraries generally is license enforcement. Unity and Unreal have a licensing system built-in that they enforce against gamedevs. Normal server software has no such thing.
That means the only threat you have as a producer of code (once the code is handed over) is the threat of withdrawing service. That means the only ways to sell licenses are:
* Build your own licensing service (or offer SaaS)
> Also there are lots of lovely projects maintained at high levels by hobbyists, and plenty of abandonware that was at some point paid for
There certainly are. I would never say to disregard anything because it was a hobby project. You just don't get to expect it being that way.
My basic point is that a hobby project can never take responsibility. If you have a support contract you are allowed to have some expectation of support. If you do not, then no expectation is warranted and everything you get is a gift.
A "mature" label carries the same problem. You are expecting the author to label something for you. That's work. If you're pulling from the commons, you must respect that people can label stuff whatever they like, and unmotivated blanket lies are not illegal.
A great point! All of the libraries I mentioned are created and maintained by corporations. Hobbyists, as always, are free to do as they please without judgement from me. :)
I will say I get great satisfaction from the little envparse library I wrote needing near-0 maintenance. It’s a rare treat to be able to consider any project truly done.
I feel like the Go ecosystem almost serendipitously has this built in - modules marked v0.X.Y being immature and under development, and v1 or greater being mature, keeping changes mostly down to bug fixes. I think some folks may even follow this convention!
One of the good thing in cargo packages are the feature flags. If a repo uses too much dependencies
then it's time to open an issue or PR to hide them behind feature flags. I do that a lot with packages that requires std even though it could do with core and alloc.
cargo tree helps a lot on viewing dependency tree. I forgot if it does LoC count or not..
> to see what lines ACTUALLY get compiled into the final binary,
This doesn't really make much sense as a lot of the functions that make it to the binary get inlined so much that it often becomes part of 'main' function
100%, I still miss feature flags in npm. Is there a package manager that can do this already? I'd love to expand our internal libs with framework-specific code
Rust generates absurd amounts of debug info, so the default debug builds are much much larger.
Zero-cost abstractions don't have zero-cost debug info. In fact, all of the optimized-away stuff is intentionally preserved with full fidelity in the debug info.
I agree that relying on unknown dependencies is a risk, but this misses the point IMO. Number of dependencies and disk space are kind of arbitrary.
> Meanwhile, the heaviest JavaScript parser implemented in JavaScript is more lightweight.
The lightest weight javascript program relies on V8 to run, which has multiple orders of magnitude more dependencies. Most of which you have never heard of.
At least cargo makes it easier to get a clearer picture of what the dependencies are for a program.
If you have one huge dep it's easier to keep track you're on the latest update, also it's much less likely you'll fat finger it and import something typosquatting.
Also if you're in enterprise you'll have less 100 page SBOM reports.
Unlike my sibling commment, i don’t work in SBOM, but if you consider social dynamics and what trust means, it should be pretty obvious that trusting in a group of 10 strangers is much less risky than trusting in 10 separate strangers.
At the end of the day you are at much higher risks of one of those 10 packages getting owned by some external party and suddenly the next version is pulling a bitcoin miner, or something that steals everything it can from your CI/CD, or does a take over on your customers.
And it's never 10 (well at least for JS), it's hundreds, or if you're team is insane, thousands.
No, it has very little to do with v8 or any runtime. Those parsers run on any decent and recent enough runtime, including browsers and Node.js. If you look at the actual code, they use basic APIs in the JavaScript language that you can find in almost any other language.
> relies on V8 to run, which has multiple orders of magnitude more dependencies.
Actually, this isn't true. (Or at least wasn't a while back.) I used to work with a bunch of ex-V8 folks and they really despised third-party dependencies and didn't trust any code they didn't write. They used a few third-party libs but for them most part, they tried to own everything themselves.
.. can afford to suffer from not invented here syndrome
.. and are under _massive_ threat of people doing supply chain attacks compared to most other projects (as they end up running on nearly any desktop computer and half the phones out there)
this just isn't viable for most projects, not just resource/time investment wise, but also reinventing/writing everything isn't exactly good to reduce bugs if you haven't to reliably access to both resources _and_ expertise. Most companies have to live with having many very average developers, and very tight resource limits.
I am counting 13 dependencies, the rest are internal ones. Are any of these superfluous or only needed for small edge cases? Serde seems exactly a case where you absolutely should use an external dependency.
Also, repository size seems an extremely irrelevant metric.
I don't think there is any point in debating this, because apparently you are in the camp of "dependencies are ok", with or without a good reason, when a different camp is "avoid dependencies unless you really have to". You just provided an example of why dependencies explode like this.
> And because you can't see a reason there is none?
Somehow every other JS based parser doesn't do fancy serialization, as far as I can tell. You can come up with reasons of why one might need it, but as a user of the parser, I want the footprint to be small, and that's a requirement. In fact, that's one of the reasons I never used swc parser in my serious projects.
You are just making stuff up. You still can not articulate why these dependencies are unnecessary.
That you in particular might have no use for the features they bring couldn't be more irrelevant. What other parsers are doing could also not be more irrelevant.
> You still can not articulate why these dependencies are unnecessary.
No, because I don't have to answer that question. I can simply choose not to use this project, like what I do with npm projects. There is a project that's 500kb in code with 120 dependencies, when another one is 100kb with 10 dependencies that's also well maintained? I'll choose the latter without question, as long as it satisfies my needs. I don't care why the other one has 120 dependencies or try to justify that.
Why are you complaining that a project you do not care about is using 13 dependencies, all of which, to your knowledge, are absolutely essential for the functionality?
>There is a project that's 500kb in code with 120 dependencies
And therefore some project using 13 dependencies is doing it wrong? What are you on about. Obviously there is an enormous abuse of dependencies in the JS ecosystem, who cares?
Their original complaint was about the project taking 20GB of disk space to compile.
Also they did point out that the parser depends on a serialisation library, so you're also mistaken about parent thinking the dependencies are necessary.
On another note, this pervasive kind of passive aggressive, hand-wavy, tribalistic, blind defense of certain technologies speak volumes about their audiences.
To address a point near the end of the article, here is my [partial] solution that works as a baseline.
Curate a collection of libraries you use and trust. This will probably involve making a number of your own. Wheel-reinvention, if you will. If done properly, even the upfront time cost will save in the long-run. I am in the minority here, but I roll my own libs whenever possible, and the 3rd party libs I use are often ones I know, have used been for, and vetted that they have a shallow tree of their own.
Is this sustainable? I don't know. But It's the best I've come up with, in order to use what I see as the best programming language available for several domains.
There are a lot of light-weight, excellent libs I will use without hesitation, and have wide suitability. Examples:
Heavier, and periodically experience mutual-version hell, but are are very useful for GUI programs:
- EGUI
- WGPU
- Winit
On a darker note, the rust web ecosystem maybe permanently lost to async and messy dependencies. Embedded is going that way too, but I have more hope there, and am doing my best to have my own tooling.
Rust really made some unfortunate choices with async, it pollutes everything but isn't generic enough so now you are married to the runtime, this bifurcates the whole ecosystem. It is nearly phobos/demios problem from Dlang, but instead Tokio just took over. One doesn't use Rust anymore, they use Tokio.
Rust will thrive despite the PLT coloring debate. Async frameworks often dominate through winner-takes-all dynamics. Most blog posts on async coloring are pretentious nonsense, and I've faced heavy moderation here for calling out their intellectual bankruptcy. The completely brain dead moralizing arguments from the ignorant deserve intense derision regardless of what HN's official rules are.
Real world software ecosystems evolve slowly, requiring years of debate to shift.
GP's complaint wasn't about the coloring, but about the fact that the basic async API is not enough for most tasks, so you don't only have colored functions, you're now also bound to an async runtime. The world would be much better if most async rust code was agnostic of the async runtime, despite still having the colored functions issue.
Sure, the situation is just vastly better - even now - than standardizing the wrong solution. Each widely used async runtime in Rust has different solutions for a number of problems, and the choice isn't obvious.
For example, `tokio::spawn()` returns a task handle that lets the task keep running after the handle is dropped. `smol::spawn()` cancels the task when the task handle is dropped.
General async cancellation requires infrastructure mechanisms, and there are multiple reasonable designs.
Letting things settle in the ecosystem is a great way to find the best design to eventually incorporate in the standard library, but it takes time.
I think it's a "cultural" thing. With Go you often find developers/projects proudly mentioning that any or just a few non-std dependencies are used. Coming from Go it really feels strange when you see pages of dependencies scrolling over your screen when you build a Rust project.
Go has a fatter standard library and a "fat" runtime with built-in green threads (an asynchronous runtime basically) and garbage collection, so you get more out of the box and thus end up using fewer dependencies.
I have yet to come across a go project that doesn't pull in tons of 3rd party code as well. It seems like maybe you're over-stating the "culture" a bit.
Yeah, while I’ve seen some great libraries that follow the practice of minimizing their dependencies, I’m a bit annoyed with the amount of dependencies that docker will bring along [1]. I’ve been on the lookout for alternatives for my docker needs, but the state of podman, buildah and some others that I checked is similar. They all bring in roughly the same number of dependencies… if anyone knows of a stripped down Go lib that can be used to build from a Dockerfile, pull, and run a container, I would be grateful for any suggestions. Heck docker / moby isn’t even using go.mod proper.
Wow, that's massive. I guess it's inevitable that a popular piece of open-source software for end-users will be compelled to accrue dependencies due to popular demand for features that require them.
I feel Telegraf made a good compromise: out of the box, it comes with a _ton_ of stuff[1] to monitor everything, but they make it possible to build only with pieces that you need via build tags, and even provide a tool to extract said tags from your telegraf config[2]. But lots of supply-chain security stuff assume everything in go.mod is used, so that can results in a lot of noise.
Thanks! That’s an interesting approach. Haven’t seen that before. I think a better approach (in a monorepo) might be to use separate go.mod files for each module, allowing the user to configure only the needed parts separately. But I haven’t seen it used much.
Big things you use off-the-shelf libraries for. Small things you open-code, possibly by cribbing from suitably-licensed open source libraries. You bloat your code to some degree, but reduce your need to audit external code and reduce your exposure to supply chain attacks. Still, the big libraries are a problem, but you're not going to open code everything.
Didn't computer science hype up code reuse for decades before it finally started happening on a massive scale? For that to actually happen we needed programming languages with nice namespaces and packaging and distribution channels. C was never going to have the library ecosystem that Java, C++, and Rust have. Now that we're there suddenly we have a very worrisome supply chain issue, with major Reflections on Trusting Trust vibes. What to do? We can't all afford to open-code everything, so we won't, but I recommend that we open-code all the _small_ things, especially in big projects and big libraries. Well, or maybe the AI revolution will save us.
I wonder how much good a “dependency depth” label on packages would do, at the crates.io level. Like, a package can only depend on a package with a lower declared dependency depth than it, and packages compete to have a low dependency depth as a badge.
I recently wrote an extremely basic Rust web service using Axum. It had 10 direct dependencies for a total of 121 resolved dependencies. I later rewrote the service in Java using Jetty. It had 3 direct dependencies for a total of 7 resolved dependencies. Absolutely nuts.
I don't think number of dependencies is a useful comparison metric here. Java runtime already implements stuff that you have to use libraries for in Rust, and it's a design choice. Rust also has slimmer std. Both languages have different constraints for this.
I agree with your general point, but for this specific functionality, I’ll point out that setting environment variables of the current process is unsafe. It took us a long time to realize it so the function wasn’t actually marked as unsafe until the Rust 2024 edition.
What this means in practice is that the call to invoke dotenv should also be marked as unsafe so that the invoker can ensure safety by placing it at the right place.
If no one is maintaining the crate, that won’t happen and someone might try to load environment variables at a bad time.
ok, I'm hooked - how is setting an env var in the current process unsafe? My gut says it's not unsafe in a memory-ownership sense, but rather in a race condition sense?
whatever the issue is, "setting an env var is unsafe" is so interesting to me that I'm now craving a blog post explaining this
Ironically a project that hasn't been changed in a while "unmaintained" is a good candidate for bumping to v1, while a project with new breaking commits every day is a bad candidate.
On the other hand loading .env from the environment is critical (since you are usually passing secrets through .env). I wouldn't want to maintain that myself and not share it with a xxK other projects in case there is a vulnerability.
All the comments and suggestions for improving rust dependency handling seem useful to me. To deal with dependency sprawl now, until the situation changes, I use a number of tools. To avoid having to set this up for each new project, I've made a template project that I simply unzip to create new rust projects.
The tools I have found useful are:
cargo outdated # check for newer versions of deps
cargo deny check # check dependency licenses
cargo about # generate list of used licenses
cargo audit # check dependencies for known security issues
cargo geiger # check deps for unsafe rust
I haven't found a cargo tool I like for generating SBOMs, so I installed syft and run that.
cargo install-update # keep these tools updated
cargo mutants # not related to deps, but worth a mention, used when testing.
Having configured all these tools once and simply unzipping a template works well for me.
Suggestions for different or additional tools welcome!
Disclaimer: I'm not a professional rust developer.
Rust at least has a partial remedy to this problem: feature flags. Many libraries use them to gate features which would otherwise pull in extra dependencies. (In fact I believe there is specific support for flags which correspond to dependency names.)
> I can't rewrite the world, an async runtime and web server are just too difficult and take to long for me to justify writing for a project like this (although I should eventually just for a better understanding).
I use safina+servlin and 1,000 lines of Rust to run https://www.applin.dev, on a cheap VM. It serves some static files, a simple form, receives Stripe webooks, and talks to Postgres and Postmark. It depends on some heavy crate trees: async-fs, async-net, chrono, diesel, rand (libc), serde_json, ureq, and url.
2,088,283 lines of Rust are downloaded by `cargo vendor` run in the project dir.
986,513 lines using https://github.com/coreos/cargo-vendor-filterer to try to download only Linux deps with `cargo vendor-filterer --platform=x86_64-unknown-linux-gnu`. This still downloads the `winapi` crate and other Windows crates, but they contain only 22k lines.
976,338 lines omitting development dependencies with `cargo vendor-filterer --platform=x86_64-unknown-linux-gnu --keep-dep-kinds=normal`.
754,368 lines excluding tests with `cargo vendor-filterer --platform=aarch64-apple-darwin --exclude-crate-path='*#tests' deps.filtered`.
750k lines is a lot to support a 1k-line project. I guess I could remove the heavy deps with another 200 hours of work, and might end up with some lean crates. I've been waiting for someone to write a good threaded Rust Postgres client.
I've come to accept that i wasn't really developing in "rust", but in "tokio-rust", and stopped worrying about async everywhere (it's not fundamentally different from what happens with other lang having async).
Why the need for going back to threaded development ?
> Out of curiosity I ran toeki a tool for counting lines of code, and found a staggering 3.6 million lines of rust. Removing the vendored packages reduces this to 11136 lines of rust.
Out of those 3.6 million lines, how many are lines of test code?
I'm quite careful to tightly control the dependencies of Tokio. All dependencies are under control by members of the Tokio team or others that I trust.
What actually surprised me in Rust, is the amount of fragmentation and abandoned libraries. For example, serde_yaml is archived and there are two other libraries that do the same (?) thing. It seems like there's a significant effort required to search for and decide which (if at all) library to use. This is not so much pronounced in Go.
Yeah, one problem in Rust is that a number of very fundamental ecosystem libraries are written by a handful of high-profile people. Often people who are also working on the standard library or the Rust compiler. Rust developers usually know their names and SoMe handles.
It's a problem because those people become overworked, and eventually have to abandon things. The deprecation of `serde_yaml` was and is a huge, huge problem, especially without any functional replacement. There was no call for new maintainers, or for someone to take over the project. I can understand the reasons why (now you're suddenly auditing people, not code), but it sucks.
How is cargo more integrated into the language than Go’s? I’ve little to no experience with Rust, but Go’s package management seems pretty fully integrated to me.
You can audit your dependencies for crates with security vulnerabilities reported to the RustSec Advisory Database, also block unmaintained crates, and enforce your license requirements using SPDX expressions with cargo-audit and cargo-deny.
You can ensure that third-party Rust dependencies have been audited by a trusted entity with cargo-vet.
And you should have taken a look at where those 3M locs come from, it's usually from Microsoft's windows-rs crates that are transitively included in your dependencies through default features and build targets of crates built to run on windows.
The solution is strong compile time and runtime guarantees about code behavior.
The author is right there's no way an individual can audit all that code. Currently all that code can run arbitrary build code at compile time on the devs machine, it can also run arbitrary unsafe code at runtime, make system calls, etc..
Software is not getting simpler, the abundance of high quality libraries is great for Rust, but there are bound to be supply chain attacks.
AI and cooperative auditing can help, but ultimately the compiler must provide more guarantees. A future addition of Rust should come with an inescapable effect system. Work on effects in Rust has already started, I am not sure if security is a goal, but it needs to be.
The (terrible) solution that we are seeing now is generative AI. Instead of importing a library, you ask an AI to write the code for you, the AI most likely has ingested a library that implements the features you need and will essentially copy-paste that part into your code, transforming it so that it matches the rest of your code.
I believe that it causes more problems than it solves, but it can be a solution to the problem of adding thousands of lines of code of dependency when you could write a 10-line function yourself.
Of course, the proper thing to do is not to be the wrong kind of lazy and to understand what you are doing. I say the wrong kind of lazy because there is a right kind of lazy, and it is about not doing things you don't need to, as opposed to doing them poorly.
That proposal is not exactly this; that seems to propose a "blessed crates" namespace which includes popular open-source libraries. I read this proposal as a Python-style batteries-included stdlib.
What the OP proposes is not exactly a bigger stdlib, because they mention it should have "relaxed stability guarantees". Or is python allowed to change their stdlib in backwards-incompatible ways?
It does happen - after a deprecation period, usually small changes, but frequently (i.e. in every minor version there will surely be someone directly affected). More recently there were entire swaths of modules removed - still a conservative change, because we're talking mainly about support for obscure file formats and protocols that hardly anyone has used this century (see https://peps.python.org/pep-0594/ for details - I may be exaggerating, but not by a lot).
Historically this process has been mostly informal; going forward they're trying to make sure that things get removed at a specific point after their deprecation. Python has also now adopted an annual release cadence; the combination of that with the deprecation policy effectively makes their versioning into a pseudo-calver.
I'm outside of the Rust community, so my two cents are worthless - but in this thread it seems a lot of people are actually wanting a defacto app framework, not necessarily a bloated "kitchen sink" style stdlib.
The stdlib probably should remain simple, in my opinion. The complexity should be optional.
I agree. Unfortunately, I think that a lot of the people who ask for a bigger standard library really just want (a) someone else to do the work (b) someone they can trust.
The people working on Rust are a finite (probably overextended!) set of people and you can't just add more work to their plate. "Just" making the standard library bigger is probably a non-starter.
I think it'd be great if some group of people took up the very hard work to curate a set of crates that everyone would use and provide a nice façade to them, completely outside of the Rust team umbrella. Then people can start using this Katamari crate to prove out the usefulness of it.
However, many people wouldn't use it. I wouldn't because I simply don't care and am happy adding my dependencies one-by-one with minimal feature sets. Others wouldn't because it doesn't have the mystical blessing/seal-of-approval of the Rust team.
The "Rust core team" should be working on the "Rust core", not every little thing that someone somewhere thinks should go in a standard library. It is part of the job of a "core team" to say "no".
A lot.
Like, a lot a lot a lot. Browse through any programming language that has an open issue tracker for all the closed proposals sometime. Individually, perhaps a whole bunch of good ideas. The union of them? Not so much.
This is obviously the best solution for Rust. A 'metalibrary' library type would add a lot of value to the ecosystem as a nexus:
- All included crates can be tested for inter-compatibility
- Release all included crates under a single version, simplifying upgrades
- Sample projects as living documentation to demo integrations and upgrades
- Breaking changes can be held until all affected crates are fixed, then bump all at once
- An achievable, valuable, local goal for code review / crev coverage metrics
There could be general "everything and the kitchen sink" metalibraries, metalibraries targeted at particular domains or industries, metalibraries with different standards for stability or code review, etc. It might even be valuable enough to sell support and consulting...
No way. I'd much prefer we have a constellation of core companion libraries like Google's Guava.
We do not need to saddle Rust with garbage that will feel dated like Python's standard library. Cargo does the job just fine. We just need some high quality optional batteries.
Embedded projects are unlikely to need standard library bloat. No_std should be top of mind for everyone.
Something that might make additional libraries feel more first class: if cargo finally got namespaces and if the Rust project took on "@rust/" as the org name to launch officially sanctioned and maintained packages.
Python's standard library is the main reason python is usable.
Python packaging is somehow a 30 year train crash that keeps going, but the standard library is good enough that I can do most things without dependencies or with very small number of them.
I don't think an additional standard library layer, whatever you call it, has to have the same tight controls on backwards compatibility and evolution that the actual standard library has. IMO the goal of creating it should be to improve supply chain security, not to provide an extremely stable API, which might be more of a priority at lower levels but chokes off the kind of evolution that will be needed.
I think what you're suggesting is a great idea for a new standard library layer, you're just not using that label. A set of packages in a Rust namespace, maintained by the same community of folks but under policies that comply with best practices for security and some additional support to meet those best practices. The crates shouldn't be required, so no_std should work just as it would prior to such a collection.
I develop for Linux, Mac, and Windows. Multiple architectures and OSes. I rarely see platform issues with Rust. It's typically only stuff at the edge, like CUDA libraries, that trip up cross-platform builds.
Rust, as a systems language, is quite good at working on a variety of systems.
Starts already that Rust won't support architectures not available on LLVM, but on GCC, otherwise having a Rust frontend project for GCC wouldn't be a thing.
And the systems language remark, I am still looking forward when sorting ABI issues for binary libraries is finally something that doesn't need to go through solutions designed for C and C++.
> We do not need to saddle Rust with garbage that will feel dated like Python's standard library.
Python's standard library is a strength, not a weakness. Rust should be so lucky. It's wonderful to have basic functionality which is guaranteed to be there no matter what. Many people work in environments where they can't just YOLO download packages from the Internet, so they have to make do with whatever is in the stdlib or what they can write themselves.
> Python's standard library is a strength, not a weakness. Rust should be so lucky.
Rust is luckier. It has the correct approach. You can find every battery you need in crates.io.
Python has had monstrosities like urllib, urllib2, http, etc. All pretty much ignored in favor of the external requests library and its kin. The standard library also has inconsistencies in calling conventions and naming conventions and it has to support those *FOREVER*.
The core language should be pristine. Rust is doing it right. Everything else you need is within grasp.
> The standard library also has inconsistencies in calling conventions and naming conventions and it has to support those *FOREVER*.
Not to mention abysmal designs inspired by cargo-cult "OOP" Java frameworks from the 90s and 00s. (Come on, folks. Object-oriented programming is supposed to be about objects, not about classes. If it were about classes, it would be called class-oriented programming.)
bigstrat2003's argument is approximately "Python is batteries included"
My counter argument is that the "batteries included" approach tends to atrophy and become dead weight.
Your counter seems to be "that's not an argument, that's just Rust hype."
Am I interpreting you correctly? Because I think my argument is salient and correct. I don't want to be stuck with dated APIs from 20 years of cruft in the standard library.
The Python standard library is where modules go to die. It has two test frameworks nobody uses anymore, and how many XML libraries? Seven? (The correct answer is "four", I think. And that's four too many.) The Python standard library has so much junk inside, and it can't be safely removed or cleaned up.
A standard library should be data structure/collections, filesystem/os libraries, and maybe network libraries. That's it. Everything else changes with too much regularity to be packed in.
Your critique doesn't match the reality of Python users.
There is a single datetime library. It covers 98% of use cases. If you want the final 2% with all the bells and whistles you can download it if you wish.
There is a single JSON library. It's fast enough for almost anything you want. If you want faster libraries with different usability tradeoffs you can use one but I have never felt compelled to do so.
Same thing with CSV, filesystem access, DB api, etc. They're not the best libraries at the time of any script you're writing, but the reality is that you never really need the best, most ergonomic library ever to get you through a task.
Because of this, many big complex packages like Django have hardly any external dependencies.
If anything you're not the one getting stuck with date APIs; it's the Python core devs. Maintainers of other packages are always free to choose other dependencies, but they almost invariably find that the Python stdlib is good enough for everything.
The Python datetime library is legacy software and has terrible ergonomics, terrible safety, and heinous pitfalls. It's one of my least favorite in the industry.
Python is packed full with this shit. Because it wasn't carefully planned and respect wasn't given to decisions that would last forever.
Python has two testing frameworks baked in, neither of which is good.
Python has historically had shitty HTTP libraries and has had to roll out several versions to fix the old ones because it couldn't break or remove the old ones. Newbies to the language will find those built in and will write new software with the old baggage.
Batteries included is a software smell. It's bad. You can't change the batteries even after they expire.
> The Python datetime library is legacy software and has terrible ergonomics, terrible safety, and heinous pitfalls. It's one of my least favorite in the industry.
Your arguments seem to come from someone who doesn't have substantial software engineering experience in large systems.
All large software systems and most effective software uses libraries that are not generally super modern and not necessarily the best of the best, but they are well-understood.
In your example for datetime libraries, notice that the writer immediately ignores libraries that at some point were better than the stdlib library, but are now unmaintained. That by itself is already a red flag; it doesn't matter that a library is better if there is a large risk that it is abandoned.
Notice that no single library in the examples mentioned solves all the problems. And notice that there is no such thing as a datetime library anywhere that has consistent, uniform and order-of-magnitude improvements such that they merit dropping the stdlib.
The stdlib is _good enough_. You can build perfectly good business systems that work reasonably well and as long as you have a couple basic ideas down about how you lay down datetime usage you'll be mostly fine. I've been working with Python for over 15 years and any time I picked a different datetime library it was jut an additional maintenance burden.
> But now you're stuck with it forever.
You're "stuck" with whatever datetime library you choose. One day your oh-so-great datetime library is going to be legacy and you'll be equally bamboozled in migrating to something better.
I've heard this argument about SQLAlchemy, the Django ORM, and various other packages. The people that chose to go somewhere less maintained are now stuck in legacy mode too.
> Python is packed full with this shit. Because it wasn't carefully planned and respect wasn't given to decisions that would last forever.
This is pure ignorance. There's not a single language standard library that is absolutely amazing. Yet the batteries-included approach ends up being a far better solution long term when you look at the tradeoffs from an engineering perspective.
> Python has two testing frameworks baked in, neither of which is good.
They are good enough. They have broad support with tons of plugins. They have assertions. They get the basics right and I've had success getting useful tests to pass. This is all that matters; you tests don't become magically better because you decided to use nose or whatever framework of the day you choose.
> Python has historically had shitty HTTP libraries and has had to roll out several versions to fix the old ones because it couldn't break or remove the old ones. Newbies to the language will find those built in and will write new software with the old baggage.
The current python docs recommend requests, and requests is a well-established package that everyone uses and is not at risk of being outdated, as it's been the go-to standard for over a decade. This is fine. If you're a library writer you're better off using urllib3 and avoiding an additional dependency.
> Batteries included is a software smell. It's bad. You can't change the batteries even after they expire.
Try to revive a line of business Node.JS app written 10 years ago with hundreds of outdated dependencies. An equivalent Python app will have a half dozen dependencies at most and if you stuck to the most popular package there's a really high change an upgrade will be smooth an easy. I've done this multiple times; tons of colleagues have had to do this often. Python's decision makes this tremendously easy.
So sorry, if you're aiming for library perfection, you're not aiming for writing maintainable software. Software quality happens on the aggregate, not in choosing the fancies most modern thing.
The issue with that is how to get everyone to agree on how that would work, e.g. what the criteria for this extension would be, what's the policy for future changes, who will maintain all of this, etc etc.
Now instead of seeing millions of lines of inscrutable code in your program bloating binary sizes, you can see it in every program (that doesn't disable stdlib).
In every program that uses a particular feature from the stdlib. Given the same feature, I tend to trust stdlib more than some rando project. And if you don't trust the stdlib, why would you trust the compiler?
Indeed, yes sometimes this brings cruft into the mix.
However I rather have cruft that works everywhere the toolchain is fully implemented, instead of playing whack-a-mole with third party libraries when only some platforms are supported.
I think that the bare bones stdlib is a huge mistake in Rust. I would love to see that rectified. Unfortunately, approximately 5 other people share that view. The Rust community as a whole is very opposed to adding functionality to std.
I mean, the case against it is pretty strong. Many languages with maximalist standard libraries have tons of vestigial code that nobody uses because the ecosystem found better solutions. Yet that code has to be maintained in perpetuity.
The C++ standard library even has this problem for something as basic as formatting (iostreams), and now it has two solutions for the same problem.
That is a serious burden on the maintainers, it creates all kinds of different problems, especially if the functionality of the libraries assumes a certain execution environment. Rust doesn't just target x86 desktops.
And? Not every project had the same amount of resources.
There is a tradeoff here. Having a large, but badly maintained, standard library with varying platform support is worse than having a smaller, but well maintained, one.
The amount of contributors is
a totally meaningless metric.
1. Not every contributor contributes equally. Some contributors work full time on the project, some work a few hours a month.
2. The amount of contributors says nothing about what resources are actually required. Rust is, no doubt, a more complex language than go and is also evolving faster.
3. The amount of contributors says nothing about the amount of contributors maintaining very niche parts of the ecosystem.
Rust has a million ways to solve a specific problem, as it is not opinionated and gets you down to the lowest level if needed. On top of that there's a million ways to encode your types. Then there's a million ways to bind C libraries.
The solution space is basically infinite, and that's a good thing for a systems programming language. It's kind of amazing how far rust reaches into higher level stuff, and I think the way too easy to use package manager and lively crate ecosystem is a big part of that.
Sometimes I wish for a higher-level rust-like language though, opinionated as hell with garbage collector, generic functions without having to specify traits, and D's introspection.
No mention here of binary size (beyond linking out to a ClickHouse blog post on the topic).
The total number of lines of code is relevant, sure, but for most practical purposes, compile times and binary sizes are more important.
I don't know the situation in Rust, but in JS land, there's a pretty clear divide between libraries that are tree-shakable (or if you prefer, amenable to dead code elimination) and those that aren't. If you stick to tree-shakable dependencies your final bundled output will only include what you actually need and can be pretty small.
> The total number of lines of code is relevant, sure, but for most practical purposes, compile times and binary sizes are more important.
Perhaps for most practical purposes, but not for security, which the article's author seems more concerned with:
> Out of curiosity I ran toeki a tool for counting lines of code, and found a staggering 3.6 million lines of rust... How could I ever audit all of that code?
Rust does static dispatch even for non-free functions. Only trait objects are dynamically dispatched, and most people argue they’re under-used in Rust, not overused.
I had the same concerns when I started using Rust, but then I eventually embraced it, for better or worse. Cargo makes it so your build almost never breaks (it's happened maybe twice for the 8 years I've been doing Rust). Plus there are still way less vulnerabilities with Rust projects than non-Rust projects, in spite of the crazy number of dependencies.
If I was to design a Rust 2.0, I'd make it so dependencies need permissions to access IO, or unsafe code, etc.
Totally agree with the core concern—dependency sprawl is no longer just a JS/Python issue, it's now visibly hitting Rust as well.
One thing I've observed while managing a mid-sized Rust codebase: cargo does a decent job with versioning, but the long tail of small, redundant crates (often differing only slightly) can still bloat the tree. The lack of a strong ecosystem-level curation layer makes it hard to know which crates are battle-tested vs. weekend hacks.
Maybe it’s time the community seriously considers optional “trust scores” or soft standards (similar to crates.io keywords, but more structured) to guide adoption. Not a gatekeeping mechanism—just more context at decision time.
Excuse me for not having much to add to the discussion but two interesting references for people to check out, if so inclined of course:
a) Ginger Bill (the Odin language creator, no affiliation) stated on a podcast that Odin will never have an official pkg manager, since what they're, in his opinion, mainly automating is dependency hell, and this being one of the main reasons for rising software complexity and lower software quality; see https://www.youtube.com/watch?v=fYUruq352yE&t=11m26s (timestamped to the correct position) (they mention Rust explicitly as an example)
b) another programmer rather seriously worried about software quality/complexity is Jonathan Blow, who's talk "Preventing the Collapse of Civilization" is worth watching in my opinion: https://www.youtube.com/watch?v=ZSRHeXYDLko (it's not talking about package managers specifically, but is on topic regarding software complexity/quality as a whole)
Addendum: And sorry, I feel like almost everyone knows this xkcd by now, but since no one so far seems to have posted it; "obligatory xkcd reference": https://imgs.xkcd.com/comics/dependency_2x.png
> a) Ginger Bill (the Odin language creator, no affiliation) stated on a podcast that Odin will never have an official pkg manager
The cognitive dissonance for how one can believe that Rust preventing you from derefercing freed memory at compile time is overzealous nannying by the language authors -- while at the same time deliberately making code reuse harder for users because they could make engineering decisions he doesn't like is staggering.
> a)... Odin will never have an official pkg manager
Perhaps this explains why Odin has found such widespread usage and popularity. /s
> b)... Jonathan Blow, who's talk "Preventing the Collapse of Civilization"
With such a grandiose title, before I first watched I thought it must be satire. Turns out, it is food for the credulous. I believe Jonathan Blow is less "seriously worried about software quality/complexity" than he is about marketing himself as the "last great hope". At least Blow's software has found success within its domain. However, I fear Blow's problem is the problem of all intellectuals: “An intellectual is a person knowledgeable in one field who speaks out only in others.” Blow has plenty of opinions about software outside his domain, but IMHO very little curiosity about why his domain may be different than your own.
My own opinion is there is little evidence to show this is a software quality problem, and any assertion that is the case needs to compare the Rust model against the putatively "better" alternatives. Complex software, which requires many people to create, sometimes across great distances of time and space, will necessarily have and require dependencies.
Can someone show me a material quality difference between ffmpeg, VLC, and Samba dependencies and any sufficiently complex Rust program (even which perhaps has many more dependencies)?
~ ldd `which ffmpeg` | wc -l
231
Now, large software dependency graphs may very well be a security problem, but it is a problem widely shared with all other software.
"Perhaps this explains why Odin has found such widespread usage and popularity. /s"
What an unnecessarily snark and dismissive comment to make about someone's work.
- I'd say within a certain niche Odin is becoming well known and gets its use
- you do realize using an `Odin package` is putting a program into a sub-folder and that's it
- It comes with a rich stdlib + vendor libraries out of the box
- and isn't it kind of up to the creators how to design and promote their language
I'd even argue it's laudable a language doesn't promote itself as a "fixes everything use me at all costs" kind of technology. The creator himself tells people it might not be the right tool for them/their use case, encourages them to try other languages too, sometimes outright tells them Odin doesn't fit their needs and xyz would probably do better.
Odin is pragmatic & opinionated in its language design and goal. Maybe the lack of a package manager is the basis for you to disregard a programming language, for plenty of others (and likely more Odin's target group) it's the least of their concerns when choosing a language.
> What an unnecessarily snark and dismissive comment to make about someone's work.
The snark was intended, however any dismissiveness concerning Ginger Bill's effort was not. However, when you make a decision like "Odin will never have a package manager", you may be choosing to condemn your project to niche status, in this day and age. Now, niche status is fine, but it definitionally comes with a limited audience. Like "this game will only ever be a text based roguelike."
I see a lot of concern like this about dependencies, mostly in node. I'm sure it's an issue I'm just not convinced it's as big of a problem as people say. We have scanners that can help keep your dependencies secure automatically. If you take a dependency and it goes unmaintained is it really that much worse than the relevant code in your own codebase going unmaintained?
Vendoring is a step in the right direction, you’ve constrained one side of the equation.
But you’re still open to typo squatting and similar issues like crates falling unmaintained - the article mentions the now famous dotenv vs. dotenvy issue (is this solvable with a more mature governance model for the crates ecosystem? At this point dotenv should probably be reclaimed). So after vendoring a baseline set of dependencies, you need to perform comprehensive auditing.
Maybe you can leverage LLMs to make that blob of vendored deps smaller / cheaper to own. Maybe you can distill out only the functionality you need (but at what cost, now you might struggle to backport fixes published upstream). Maybe LLMs can help with the auditing process itself.
You need a stream of notifications of upstream fixes to those vendored deps. Unfortunately in the real world the decision making will be harder than “ooh, there’s a sec fix, I should apply that”.
I always wonder why someone like JFrog don’t expand their offering to provide “trusted dependencies” or something similar. I.e. you pay to outsource that dependency governance and auditing. Xray scanning in the current product is a baby step toward the comprehensiveness I’m suggesting.
Taking a step back though, I’d be really careful not to throw the baby out with the bath water here. Rust has a fairly unique capability to compose work product from across unrelated developers thanks to its type system implementation (think about what happens with a C library, who’s responsible for freeing the memory, you or me?). Composition at scale is rusts super power, at least in terms of the productivity equation for large enterprises - in this context memory safety is not the sales pitch since they already have Java or whatever.
I agree that there are too many dependencies in Rust. I support the idea of adding some of the more popular crates to std. Many applications use something like tracing, tracing-subscriber, and basic server/client functionality. It would be great to have simple, minimal-feature implementations of these in std — similar to how Go does it. If someone needs a more complex system, they can still use an external crate, but having basic building blocks in std would really help.
This is my first encounter with OSGi. It seems to me that the "Lego hypothesis" reflects an increasing justified approach. The ACM Queue article mentions hot plugging and dependency injection, and a comment[0] in this thread brings up Sans IO. This also ties into capabilities, as a security measure but also an approach to modularity. The common thread is that programs should be written with a strong sense of boundaries: both what is included and what is not included is vital, and the boundary must allow the inside to communicate with the outside. Push dependencies to the boundary and create interfaces from them. The general principles for trivially pluggable components are all out there now. More efforts like OSGi will be needed to principles into practice.
I used to be firmly in the component oriented camp. The reality of the matter is that the conceptual (mental) model doesn't really represent the reality of composing with reusable components.
All Lego components have the same simple standard mechanism: friction coupling using concave and convex surface elements of the component. Unix pipes are the closest thing we have to a Lego like approach and there the model of "hooking pipes of bytes from sources to sinks" actually represents what happens with the software.
With components and APIs, unless we resort to some universal baseline (such as a small finite semantic API like REST's "verbs") that basically can marshall and unmarshall any arbitrary function call ('do (func, context, in-args, out-args, out-err)' the Lego metaphor break down very quickly.
The second issue are the modalities of 'interactions' between components. So this is my first encounter with "Sans-IO" (/g) but this is just addressing the interactions issue with a fiat 'no inter-actions by components'. So Lego for software: great overall expression of desired simplicity, but not remotely effective as a generative concept and imo even possibly detrimental (as it over simplifies the problem).
Now we have 2 different pieces of software tech that somewhat have managed to arrive at component orientation: using a finite set of predefined components to build general software. One is GUI components, where a small set of visual components and operational constructs ("user-events", etc.) with structural and behavioral semantics are used to create arbitrary visual interfaces for any ~domain. The other is WWW where (REST verbs of) HTTP also provide a small finite set of 'components' (here architectural) to create arbitrary services. With both, there is the tedious and painful process of mapping domain semantics to structural components.
So we can get reusable component oriented software (ecosystems) but we need to understand (per lessons of GUIs and WebApps) that a great deal of (semantic) glue code and infrastructure is necessary, just as a lot of wiring (for GUIs) and code frameworks (for WebApps) are necessary. That is what something like OSGi brings to the table.
This then leads to the question of component boundary and granulity. With things like DCOM and JEE you have fine grained components aggregated in process boundaries. The current approach is identifying process boundary as component boundary (docker, k8, microservices) (and doing away with 'application servers' in the process).
> With components and APIs, unless we resort to some universal baseline (such as a small finite semantic API like REST's "verbs") that basically can marshall and unmarshall any arbitrary function call ('do (func, context, in-args, out-args, out-err)' the Lego metaphor break down very quickly.
I agree that this is generally what happens, and I would like to suggest that there is a better, harder road we should be taking. The work of programming may be said to be translation, and we see that everywhere: as you say, mapping domain semantics (what-to-do) to structural components (how-to-do-it), and compilers, like RPC stub generation. So while a few verbs along the lines of RPC/REST/one-sided async IPC are the domain of the machine, we programmers don't work well with that. It's hard to agree, though, and that's not something I can sidestep. I want us to tackle the problem of standardization head-on. APIs should be easy to define and easy to standardize, so that we can use richly typed APIs with all the benefits that come from them. There's the old dream of making programs compose like procedures do. It can be done, if we address our social problems.
> The second issue are the modalities of 'interactions' between components. So this is my first encounter with "Sans-IO" (/g) but this is just addressing the interactions issue with a fiat 'no inter-actions by components'. So Lego for software: great overall expression of desired simplicity, but not remotely effective as a generative concept and imo even possibly detrimental (as it over simplifies the problem).
I'm not sure what you mean, so I may be going off on a tangent, but Sans IO, capabilities, dependency injection etc. are more about writing a single component than any inter-component code. The part that lacks IO and the part that does IO are still bundled (e.g. with component-as-process). There is a more extensive mode, where whoever controls a local subsystem of components decides where to put the IO manager.
> Now we have 2 different pieces of software tech that somewhat have managed to arrive at component orientation: using a finite set of predefined components to build general software.
> So we can get reusable component oriented software (ecosystems) but we need to understand (per lessons of GUIs and WebApps) that a great deal of (semantic) glue code and infrastructure is necessary, just as a lot of wiring (for GUIs) and code frameworks (for WebApps) are necessary.
I agree, which is why I want us to separate the baseline components from more powerful abstractions, leaving the former for the machine (the framework) and the latter for us. Does the limited scope of HTTP by itself mean we shouldn't be able to provide more semantically appropriate interfaces for services? The real issue is that those interfaces are hard to standardize, not that people don't make them.
> I agree, which is why I want us to separate the baseline components from more powerful abstractions, leaving the former for the machine (the framework) and the latter for us. Does the limited scope of HTTP by itself mean we shouldn't be able to provide more semantically appropriate interfaces for services? The real issue is that those interfaces are hard to standardize, not that people don't make them.
We're likely in general agreement in terms of technical analysis. Let's focus on the concrete metric of 'economy' and hand-wavy metric of 'natural order'.
Re the latter, consider the thought that 'maybe the reason it is so difficult to standardize interfaces is because it is a false utopia?'
Re the former, the actual critical metric is 'is it more ecomical to create disposable and ad-hoc systems, or, to amortize the cost of a very "hard" task across 1 or 2 generations of software systems and workers?'
Now the industry voted with its wallets and blog propaganda of 'fresh engineers' with no skin in the component oriented approach in early '00s. That entire backlash that included "noSQL" movement was in fact, historically, a shift mainly motivated by economic considerations aided by a few black swans, like Linux and containarization. But now, the 'cost' of the complexity of assembly, deployment, and orchestration of a system based on that approach is causing information overload on the workers. And now we have generative AI, which seems to further tip the economic balance in favor of the late stage ad-hoc approach to putting a running system together.
As to why I used 'natural order'. The best "Lego like" system out there is organic chemistry. The (Alan) Kay vision of building code like nature builds organisms is of course hugely appealing. I arrived at the same notions independently when younger (post architecture school) but what I missed then and later realized is that the 'natural order' works because of the stupendous scales involved and the number of layers! Sure, maybe we can get software to be "organic" but it will naturally (pi) present the same perplexity to us as do biological systems. Do we actually fully understand how our bodies work?
I see two points: safety - bigger supply chain attack surface, and code bloat/compiler performance. The later has been discussed in numerous posts here (the whole idea of a linker from the start was to get rid of unused functions, so not a big problem imo). The safety is a serious and legit consideration, but we also rely on Linux and build tools to build things. How do you know the compiler that was used to build Linux hasn't been compromised, perhaps several generations ago, and now your Linux has a backdoor that is not in Linux source code? There was a research paper on this IIRC. We trust the ecosystem to validate each tool we use. We just have to do the same with our own projects - only use what's relevant, and we should do dependency hygiene to check if it is coming from a reputable source...
Dependency and build management is a fascinating and still unsolved problem in software engineering (in some sense it is the central problem).
I am wondering if there is a good modern reference that provides a conceptual overview or comparative study of the various techniques that have been attempted.
It is a hard subject to define as it cuts through several layers of the stack (all the way down to the compiler system interface layer), and most book focus on one language or build technology rather than providing a more conceptual treatment of the techniques used.
Out of curiosity I ran toeki a tool for counting lines of code, and found a staggering 3.6 million lines of rust. Removing the vendored packages reduces this to 11136 lines of rust.
Tokei hasn't had a stable release in over 4 years and misreports lines of code in some instances. The author in the past has basically said they would need to be paid to backport one line fixes with no merge conflicts that fix real accuracy issues in their software... Bad look in my book.
Everyone is in such a rush to get their project out the door, no one has time to generate a key and properly code sign releases and begin developing a more secure chain. Now we have JS package "whatever code" ecosystem but for Rust. As if we haven't watched NPM get hacked many times over the last decade or so.
> Everyone is in such a rush to get their project out the door
This is the cause of so many issues.
And its not like we're at war or trying to cure the next pandemic, we're writing CRUD apps and trying to convince people to click on adds for crap they don't need.
> As if we haven't watched NPM get hacked many times over the last decade or so.
When has this happened? The only one I remember is the event-stream thing, and that was what, over five years ago? Doesn't seem all that common from what I can see?
3.6M lines of code seems so much that it sets off my "are you sure that's counted right?" alarm.
I'm not very familiar with Rust, but all of Go is 1.6M lines of Go code. This includes the compiler, stdlib, tests for it all: the lot.
Not that I doubt the sincerity of the author of course, but maybe some irrelevant things are counted? Or things are counted more than once? Or the download tool does the wrong thing? Or there's tons of generated code (syscalls?)? Or ... something? I just find it hard to believe that some dependencies for web stuff in Rust is twice all of Go.
> This whole fiasco led me tho think .... do I even need this crate at all? 35 lines later I had the parts of dotenv I needed.
"A little copying is better than a little dependency." - grab the parts that you need and then include the library only in a test to ensure alignment down the line, an idea I liked a lot.
I think the main problem is that you should be able to run dependencies inside their own sandbox, and the language focuses only on memory safety within a monolithic program.
the problem is if you put library dependencies in their own sandbox you have a different kind of interface (much more limited) for libraries
like e.g. if we look at sandbox boundaries we have:
- some in language permission enforcement (e.g. Java Security Manage) -- this approach turned out to be a very bad idea
- process boundaries, i.e. take the boundary the OS enforces and lock it down more (e.g. by stuff like pledge, cgroups etc.) -- this approach turned out okayish
- VM boundaries (e.g. firecracker VMs) -- tourned out well
- emulation boundaries (e.g. WASM) -- mixed history, can turn out well especially if combined with worker processes which lock themself down
but what that means in practice is that wanting the reliably sand box library dependencies will most likely lead to more or less IPC boundaries between the caller and the libary
what that means is practice it's unsuited for a lot of thing
e.g. for most utility lib it's very unsuited
e.g. for a lot (but not all) data structure libs its unsuited and might be a huge issue
e.g. you can apply it to a web-server, but then you are basically reinventing CGI, AGI which okay but can quite compete with perf.
e.g. but you can't apply it to some fundamental runtime engine (e.g. tokio), worse you now might have one copy of the engine running per sandbox... (but you can apply it to some sub-part of tokio internals)
People have tried this a lot in various ways.
But so far this always died off in the long run.
Would be nice if the latest push based around WASM would have some long term success.
> the problem is if you put library dependencies in their own sandbox you have a different kind of interface (much more limited) for libraries
Nobody said it would be easy. As an analogy, the borrow checker makes working with memory much more limited, yet some people like it because it makes things safer.
Thanks! This is a very detailed explanation of why existing sandboxing techniques will not work as expected for dependencies (wrt to functionality or performance).
They should take a look at OPAM (OCaml’s package manager). There was a really impressive talk at the OCaml Workshop at POPL or ICFP a couple of years ago about how it works. Basically, they have a huge CI infrastructure and keep all versions of every package ever published. So, once you’ve found the right set of dependencies for your project, you can be sure the exact versions will always be available via OPAM.
As a fellow rust developer, I love our dependencies but I put a lot of effort into pruning the ones I want to use. If I see a crate using too many I might contribute to it or find a replacement.
If you want to use dependencies, I wouldn't be surprised when you realise they also want to use dependencies. But you can put your money/time in the right places. Invest in the dependencies that do things well.
A thought experiment for this writer: imagine if Tokio (and all its dependencies) were moved into the Rust standard library, so that it was more like Go. Would that make them more comfortable depending on it (not that they'd have a choice any more)? If so, why?
This is a general problem: devs pulling in libraries instead of writing a few lines of code. Those libraries pull in more dependencies that have even more dependencies.
LLM coding assistants are a partial solution. Recently I typed
vec4 rgb2hsv(vec4 rbg)
and a few tab-completes later it had filled in the body of the code with a correct color conversion routine. So that saved me searching for and pulling in some big-ass color library.
Most of lodash.js can be avoided with LLMs too. Lodash's loops are easier to remember than Javascript's syntax, but if your LLM just writes the foo.forEach((value, key) => {...}) for you, you can skip the syntactic sugar library.
> do I even need this crate at all? 35 lines later I had the parts of dotenv I needed.
I'm not saying you copy-pasted those 35 lines from dotenvy, but for the sake of argument let's say you did: now you can't automatically benefit from dotenvy patching some security issue in those lines.
Can't benefit from them patching a security issue, but don't suffer from
- them breaking something
- a supply chain attack
- them making a change which breaks your program
- you having accidentally relied on a bug or an unintended behavior of their code
(which they may fix at any moment)
- many unneeded LOC in your codebase
- absolution of ownership
- relying on a dependency versus having written it yourself
- in the latter case you'll automatically take responsibility
- think much more about code's security/quality
- have the knowledge to fix it and know exactly where to
(in your 35-lines of code you yourself wrote)
- more burdensome upgrades of your software
- longer compilation speeds
- having to monitor their program
- is it abandoned, ownership transferred to dubious party
- did the maintainer have a late night drunken stupor accepting bad pull requests
- did they react to a CVE or not
- did they change the license
- do they have a license but added their own problematic paragraph
- does the program "develop badly"
(change its target scope in any problematic way)
(take on more and more bloat, more unneeded functionality)
- having worse of an overview of your total dependencies
(since they may themselves rely on further crates you don't expect)
- ...
To benefit you have to actually trust the current and future maintainers of the package, its dependencies, the dependencies of its dependencies, etc. You can also automatically get breached in a supply chain attack, so it's a tradeoff
If you REALLY need such update, you can easily subscribe to updates from the mainstream project (in whatever way it allows) and patch your version when that rare situation occurs.
A large removable standard library seems to be the optimal solution. It is there by default for everyone, but if needed for embedded scenario it can be removed, leaving only the core language features.
When I am compiling Rust applications, I must admit I'm always rather bemused at the number of dependencies pulled. Even what I'd have thought to be simple tools reach easily about 200 dependent packages. It's nightmarish. One way this becomes particularly apparent is if you're trying to create a reproducible package for Guix or Nix. You end up having to manually specify a package for every different Rust library because of how those system require reproducible builds. The process of writing Guix package for software has been extremely illuminating for me, as to just how deeply nested certain technologies are vs. others. I'd be willing to bet it's a good metric for what sticks around. If you've got 200 dependencies, I don't think your software is gonna last the test of time. It seems a recipe for endless churn.
Python is in the same spot now you can't easily install packages globally. I don't have the hard drive space to develop multiple projects anymore. Every single project takes up multiple gigs in just dependencies.
"Not thinking about package management careful makes me sloppy."
Isn't the point of a memory safe language to allow programmers to be sloppy without repercussions, i.e., to not think about managing memory and even to not understand how memory works.
Would managing dependencies be any different. Does Rust allow programmers to avoid thinking carefully about selecting dependencies.
> Isn't the point of a memory safe language to allow programmers to be sloppy without repercussions, i.e., to not think about managing memory and even to not understand how memory works
No. The point is even the best programmers of unsafe languages regularly introduce both simple and subtle bugs into codebases while being careful about handling memory correctly, and therefore we should use languages that don't even allow those bugs for most every use case. Using these languages still allows crap programmers to waste GBs of correctly allocated and handled memory, and good programmers to write tight, resouce-sipping code.
If careful programmers who can manage memory should use the same language as careless ones who cannot, then does this mean both should also automatically use third party libraries by default.
Are there systems languages that provide memory management but do not default to using third party libraries. If yes, then do these languages make it easier for programmers to avoid dependencies.
No, the point is to stop you from being sloppy. The code won't compile if you're sloppy with memory management.
You can be _relatively_ sure that you're not introducing memory unsafety by adding a dependency, but you can't be sure that it isn't malware unless you audit it.
> when checking a rust security advisory mentioning that dotenv is unmaintained
This is a problem with all languages and actually an area where Rust shines (due to editions). Your pulled in packages will compile as they previously did. This is not true for garbage collected languages (pun intended).
> Out of curiosity I ran toeki a tool for counting lines of code, and found a staggering 3.6 million lines of rust .... How could I ever audit all of that code?
Again, another area where Rust shines. You can audit and most importantly modify the code. This is not that easy if you were using Nodejs where the runtimes are behind node/v8 or whatever. You compile these things (including TLS) yourself and have full control over them. That's why Tokio is huge.
> This is not true for garbage collected languages
JavaScript is backwards compatible going back effectively forever, as is Java. Rust's unique system is having a way to make breaking changes to the language without breaking old code, not that they prioritize supporting old code indefinitely.
The libraries are a different story—you're likely to have things break under you that rely on older versions of libraries when you update—but I don't see Rust actually having solved that.
> You can audit and most importantly modify the code. This is not that easy if you were using Nodejs where the runtimes are behind node/v8 or whatever.
Node and V8 are open source, which makes the code just as auditable and modifiable as the 3.6 million lines of Rust. Which is to say, both are equally unapproachable.
> The libraries are a different story—you're likely to have things break under you that rely on older versions of libraries when you update—but I don't see Rust actually having solved that.
No language can fix that. However, I've lost count of the times my Python/JavaScript interpretation fails because of something in one of the dependencies. Usually, it's not a JS/Python problem but rather has to do with a Node/Python version update. It always boils down to the "core" issue which is the runtime. That's why I like that Rust give me a "fixed" runtime that I download/compile/package with my program.
> Node and V8 are open source, which makes the code just as auditable and modifiable as the 3.6 million lines of Rust. Which is to say, both are equally unapproachable.
I've recently patched a weird bug under Tokio/Otel and can't imagine doing that with Node/V8 without it being a major hassle. It is relatively straightforward in Rust though requires maintaining your own fork of only the dependency/branch in question.
IMO any system where taking a dependency is "easy" and there is no penalty for size or cost is going to eventually lead to a dependency problem. That's essentially where we are today both in language repositories for OSS languages and private monorepos.
This is partly due to how we've distributed software over the last 40 years. In the 80s the idea of a library of functionality was something you paid for, and painstakingly included parts of into your size constrained environment (fit it on a floppy). You probably picked apart that library and pulled the bits you needed, integrating them into your builds to be as small as possible.
Today we pile libraries on top of libraries on top of libraries. Its super easy to say `import foolib`, then call `foolib.do_thing()` and just start running. Who knows or cares what all 'foolib' contains.
At each level a caller might need 5% of the functionality of any given dependency. The deeper the dependency tree gets the more waste piles on. Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call, but all you did was take that one dependency to format a number.
In some cases the languages make this worse. Go and Rust, for example, encourage everything for a single package/mod to go in the same file. Adding optional functionality can get ugly when it would require creating new modules, but if you only want to use a tiny part of the module, what do you do?
The only real solution I can think of to deal with this long term is ultra-fine-grained symbols and dependencies. Every function, type, and other top-level language construct needs to declare the set of things it needs to run (other functions, symbols, types, etc). When you depend on that one symbol it can construct, on demand, the exact graph of symbols it needs and dump the rest for any given library. You end up with the minimal set of code for the functionality you need.
Its a terrible idea and I'd hate it, but how else do you address the current setup of effectively building the whole universe of code branching from your dependencies and then dragging it around like a boat anchor of dead code.
> IMO any system where taking a dependency is "easy" and there is no penalty for size or cost is going to eventually lead to a dependency problem.
Go and C# (.NET) are counterexamples. They both have great ecosystems and just as simple and effective package management as Rust or JS (Node). But neither Go or C# have issues with dependency hell like Rust or even more JavaScript, because they have exceptional std libs and even large frameworks like ASP.NET or EF Core.
A great std lib is obviously the solution. Some Rust defenders are talking it down by giving Python as counter example. But again, Go and C# are proving them wrong. A great std lib is a solution, but one that comes with huge efforts that can only be made by large organisations like Google (Go) or Microsoft (C#).
No it doesn't.
A large stdlib solves the problems the language is focused on. For C# and Go that is web hosts.
Try using them outside that scope and the dependencies start to pile in (Games, Desktop) or they are essentially unused (embedded, phones, wasm)
> A large stdlib solves the problems the language is focused on
That's part of it, but it also solves the problem of vetting. When I use a Go stdlib I don't have to personally spend time to vet it like it do when looking at a crate or npm package.
In general, Go & Rust packages on github are high quality to begin with, but there is still a pronounced difference between OS packages and what is approved to be part of the language's own stdlib.
It's nice to know thousands of different companies already found the issues for me or objected to them in reviews before the library was published.
“Web server” is a pretty big use case though.
But I agree that graphics is often overlooked in std libs. However that’s a bit of a different beast. Std libs typically deal with what the OS provides. Graphics is its own world so to speak.
As for Wasm: first, that’s a runtime issue and not a language issue. I think GC is on the roadmap for Wasm. Second, Go and C# obviously predate Wasm.
In the end, not every language should be concerned with every use case. The bigger question is whether it provides a std lib for the category of programs it targets.
To take a specific example: JS isn’t great at efficiently and conveniently generating dynamic HTML. You can go far without (or minimal) dependencies and some clever patterns. But a lot of pain and work hours would have been saved if it had something that people want to use out of the box.
> “Web server” is a pretty big use case though.
You don't consider games, desktop and mobile applications big use cases, each being multi billion industries?
I don't know man, I feel like you're arguing in bad faith and are intentionally ignoring what the athrowaway3z said: it works there because they're essentially languages specifically made to enable web development . That's why their standard lib is plenty for this domain.
I can understand that web development might be the only thing you care about though, it's definitely a large industry - but the thesis of a large standard lib solving the dependency issue really isnt true, as (almost) every other usecase beyond web development shows.
> but the thesis of a large standard lib solving the dependency issue really isnt true, as (almost) every other usecase beyond web development shows.
I don't think the dependency issue can be solved by a good std lib, but it certainly can be mitigated as some languages show.
I think JS is a very pronounced case study here.
Web is likely bigger than all of those together. And large part of mobile and desktop apps depends on the web tech these days.
Specifically those languages are back end focused so about 28% of developers. 55 focus on front end. If you add up games desktop and mobile, oddly you get 28% as well. So not bigger but the same size good intuition! That leaves out embedded 8% and systems (8-12%). Which are probably more what rust is used for. There is obviously overlap and we haven't mentioned database or scientific programming at 12 and 5 percent respectively.
Edit: after rereading this I feel like I may have come across sarcastic, I was legitimately impressed a guess without looking it up would peg the ratio that closely. It was off topic as a response too. So I'll add that rust never would have an asynch as good as tokio, or been able to have asynch in embedded as with embassy, if it hadn't opted for batteries excluded. I think this was the right call given its initial focus as a desktop/systems language. And it is what allowed it to be more than that as people added things. Use cargo-deny, pin the oldest version that does what you need and doesn't fail cargo deny. There are several hundred crates brought in by just the rust lang repo, if you only vet things not in that list, you can save some time too.
"Web server" is, more or less, about converting a database into JSON and/or HTML. There are complexities there, sure, but it's not like it's some uniquely monumental undertaking compared to other fields.
Not all web servers deal in HTML or JSON, many don't have databases outside of managing their internal state.
Even ignoring that, those are just common formats. They don't tell you what a particular web server is doing.
Take a few examples of some Go projects that either are web servers or have them as major components like Caddy or Tailscale. Wildly different types of projects.
I guess one has to expand "web server" to include general networking as well, which is definitely a well supported use case or rather category for the Go std lib, which was my original point.
> Web server" is, more or less, about converting a database into JSON and/or HTML
You seem to have a very different definition of "web server" to me.
Just to explain this confusion, the term “web server” typically refers specifically to software that is listening for HTTP requests, such as apache or nginx. I would use the term “application server” to refer to the process that is processing requests that the web server sends to it. I read “web server” in their comment as “application server” and it makes sense.
Yes. That's the same distinction I would expect. Although I'm not sure that the database stuff is the role I'd usually look for in the application server itself.
Maybe it's a language community thing.
actually dotnet also does not need too many dependencies for games and desktop apps.
So it comes out of box with good renderers, physics engines, localization, input controllers and in-game GUIs?
The libraries you listed are too specialized. And they require integration with asset pipeline which is well outside of scope of a programming language.
As for the generic things, I think C# is the only mainstream language which has small vectors, 3x2 and 4x4 matrices, and quaternions in the standard library.
> I think C# is the only mainstream language which has small vectors, 3x2 and 4x4 matrices, and quaternions in the standard library.
They've got SIMD-accelerated methods for calculating 3d projection matrices. No other ecosystem is even close once you start digging into the details.
To be fair, there is no language that has a framework that contains all of these things... unless you're using one of the game engines like Unity/Unreal.
If you're willing to constrain yourself to 2D games, and exclude physics engines (assume you just use one of the Box2D bindings) and also UI (2D gamedevs tend to make their own UI systems anyway)... Then your best bet in the C# world is Monogame (https://monogame.net/), which has lots of successful titles shipped on desktop and console (Stardew Valley, Celeste)
> To be fair, there is no language that has a framework that contains all of these things.
Depends. There is Godot Script. Seeing how it comes with a game engine.
But original claim was
If you're including languages with big game engines. It's a tautology. Languages with good game engines, have good game engines.But general purpose programming language has very little to gain from including a niche library even if it's the best in business. Imagine if C++ shipped with Unreal.
Those are extremely specialized dependencies. Whereas in Rust, we talk about e.g. serde, which is included in the std libs for many major languages
Are you really trying to compare serde to rendering engines?
>A great std lib is obviously the solution. Some Rust defenders are talking it down by giving Python as counter example.
Python's standard library is big. I wouldn't call it great, because Python is over 30 years old and it's hard to add things to a standard library and even harder to remove them.
There are things added from tine to time, but yeah, some stuff in there just feels dated at this point.
I’m still hoping we can get a decently typed argparse with a modern API though (so much better for tiny scripts without deps!)
I'm thankful argparse exists in pythons stdlib. But argument parsing is not that hard especially for simpler programs. programmers should be able to think for a minute and figure it out instead of always reaching for clap, thats how you get dependency hell.
Argument parsing, in partucular, is a great place to start realizing that you can implement what you need without adding a dozen dependencies
Hard disagree. Standardized flag parsing is a blessing on us all, do not want to jave to figure out what flag convention the author picked to implement of the many lile one does with non getopt c programs.
Don't disagree with the principle, there are a lot of trivial pythong deps, but rolling your own argument parsing is not the way
Again, argument parsing is not that hard most of the time. You dont have to make your own conventions. Thats just weird.
If youve never thought about it, it might seem like you need an off-the-shelf dependency. But as programmers sometimes we should think a bit more before we make that decision.
Argument parsing is absolutely the kind of thing where I'd reach for a third-party library if the standard library didn't provide (and in Python's case, maybe even then - argparse has some really unpleasant behaviours). When you look through library code, it might seem like way more than you'd write yourself, and it probably is. But on a conceptual level you'll probably actually end up using a big chunk of it, or at least see a future use for it. And it doesn't tend to pull in a lot of dependencies. (For example, click only needs colorama, and then only on Windows; and that doesn't appear to bring in anything transitively.)
It's a very different story with heavyweight dependencies like Numpy (which include reams of tests, documentation and headers even in the wheels that people are only installing to be a dependency of something else, and covers a truly massive range of functionality including exposing BLAS and LAPACK for people who might just want to multiply some small matrices or efficiently represent an image bitmap), or the more complex ones that end up bringing in multiple things completely unrelated to your project that will never be touched at runtime. (Rich supports a ton of wide-ranging things people might want to do with text in a terminal, and I would guess most clients probably want to do exactly one of those things.)
You can, but there’s always a tradeoff, as soon as I’ve added about the 3rd argument, I always wish i had grabbed a library, because i’m not getting payed to reinvent this wheel.
Sure. And thats how you get leftpad and dependency "supply chain" drama.
While not everything in Python's stdlib is great (I am looking at you urllib), I would say most of it is good enough. Python is still my favorite language to get stuff done exactly because of that.
Maybe Python 4 will just remove stuff.
My personal language design is strongly inspired by what I imagine a Python 4 would look like (but also takes hints from other languages, and some entirely new ideas that wouldn't fit neatly in Python).
I don’t want a large std lib. It stifles competition and slows the pace of development. Let libraries rise and fall on their own merits. The std lib should limit itself to the basics.
> but neither Go or C# have issues with dependency hell like Rust or even more JavaScript, because they have exceptional std libs
They also have a lot narrower scope of use, which means it is easier to create stdlib usable for most people. You can't do it with more generic language.
I would say C# gets used almost everything at Microsoft between GUIs, backends, DirectX tooling (new PIX UI, Managed DirectX and XNA back in Creative Arcade days), Azure,..., alongside C++, and even if Microsoft <3 Rust, in much bigger numbers.
I didn't understand the embedded systems argument. Just because a standard lib is large doesn't mean it all ends up in the compilation target.
Indeed, it has no bearing on binary size at all, because none of it will be included. If you are coming from the perspective where the standard library is entirely unusable to begin with, then improving the standard library is irrelevant at best. It also likely means that at least some time and effort will be taken away from improving the things that you can use to be spent on improving a bunch of things that you can't use.
I feel like this is an organizational problem much more than a technical one, though. Rust can be different things to different people, without necessarily forcing one group to compromise overmuch. But some tension is probably inevitable.
> Indeed, it has no bearing on binary size at all, because none of it will be included.
That depends on the language. In an interpreted language (including JIT), or a language that depends on a dynamically linked runtime (ex c and c++), it isn't directly included in your app because it is part of the runtime. But you need the runtime installed, and if your app is the only thing that uses that runtime, then the runtime size is effectively adds to your installation size.
In languages that statically link the standard library, like go and rust, it absolutely does impact binary size, although the compiler might use some methods to try to avoid including parts of the standard library that aren't used.
Embedded Rust usually means no_std Rust, in which case no, neither the standard library nor any runtime to support it get included in the resulting binary. This isn't getting externalized either; no_std code simply cannot use any of the features that std provides. It is roughly equivalent to freestanding C.
What you say is true enough for external-runtime languages and Go, though TinyGo is available for resource-constrained environments.
Well, Rust's standard library has three components, named core, alloc and std
The no_std Rust only has core but this is indeed a library of code, and freestanding C does not provide such a thing = freestanding C stdlib provides no functions, just type definitions and other stuff which evaporates when compiled.
Two concrete examples to be going along with: Suppose we have a mutable foo, it's maybe foo: [i32; 40]; (forty 32-bit signed integers) or in C maybe they're int foo[40];.
In freestanding C that's fine, but we're not provided with any library code to do anything with foo, we can use the core language features to write it outselves, but nothing is provided.
Rust will happily foo.sort_unstable(); this is a fast custom in-place sort, roughly a modern form of introspective sort written for Rust by its creators and because it's in core, that code just goes into your resulting embedded firmware or whatever.
Now, suppose we want to perform a filter-map operation over that array. In C once again you're left to figure out how to write that in C, in Rust foo impl IntoIterator so you can use all the nice iterator features, the algorithms just get baked into your firmware during compilation.
I think this is partially true, but more nuanced than just saying that Rust std lib is lacking.
Compared to go and c#, Rust std lib is mostly lacking:
- a powerful http lib
- serialization
But Rust approach, no Runtime, no GC, no Reflection, is making it very hard to provide those libraries.
Within these constraints, some high quality solutions emerged, Tokio, Serde. But they pioneered some novel approaches which would have been hard to try in the std lib.
The whole async ecosystem still has a beta vibe, giving the feeling of programming in a different language. Procedural macros are often synonymous with slow compile times and code bloat.
But what we gained, is less runtime errors, more efficiency, a more robust language.
TLDR: trade-offs everywhere, it is unfair to compare to Go/C# as they are languages with a different set of constraints.
I would say compared to other languages Rust feels even more lacking.
All those AFAIR need 3rd party packages:
Regex, DateTime, base64, argument parsing, url parsing, hashing, random number generation, UUIDs, JSON
I'm not saying it's mandatory, but I would expect all those to be in the standard library before there is any http functionality.
Having some of those libraries listed and then not being able to change API or the implementation is what killed modern C++ adoption (along with the language being a patchwork on top of C).
As some of the previous commenters said, when you focus your language to make it easy to write a specific type of program, then you make tradeoffs that can trap you in those constraints like having a runtime, a garbage collector and a set of APIs that are ingrained in the stdlib.
Rust isn't like that. As a system programmer I want none of them. Rust is a systems programming language. I wouldn't use Rust if it had a bloated stdlib. I am very happy about its stdlib. Being able to swap out the regex, datetime, arg parsing and encoding are a feature. I can choose memory-heavy or cpu-heavy implementations. I can optimize for code size or performance or sometimes neither/both.
If the trade-offs were made to appease the easy (web/app) development, it wouldn't be a systems programming language for me where I can use the same async concepts on a Linux system and an embedded MCU. Rust's design enables that, no other language's design (even C++) does.
If a web developer wants to use a systems programming language, that's their trade-off for a harder to program language. The similar type safety to Rust's is provided with Kotlin or Swift.
Dependency bloat is indeed a problem. Easy inclusion of dependencies is also a contributing factor. This problem can be solved by making dependencies and features granular. If the libraries don't provide the granularity you want, you need to change libraries/audit source/contribute. No free meals.
Yeah I’ve encountered the benefit of this approach recently when writing WASM binaries for the web, where binary size becomes something we want to optimize for.
The de facto standard regex library (which is excellent!) brings in nearly 2 MB of additional content for correct unicode operations and other purposes. The same author also makes regex-lite, though, which did everything we need, with the same interface, in a much smaller package. It made it trivial to toss the functionality we needed behind a trait and choose a regex library appropriately in different portions of our stack.
> Being able to swap out the regex, datetime, arg parsing and encoding are a feature
A feature present on every language that has those in the stdlib.
Not necessarily, when other components of the stdlib depend on them
Also not necessarily with third-party libraries.
Indeed. However, you need to recognize that having those features in stdlib creates a huge bias against swapping them out. How many people in Java actually uses alternative DB APIs than JDBC? How many alternative encoding libraries are out there for JSON in Go? How about async runtimes, can you replace that in Go easily?
True! Although it’s easier to swap out a third party lib that’s using a bloated dependency than it is to avoid something in std.
> All those AFAIR need 3rd party packages: Regex
Regex is not 3rd party (note the 'rust-lang' in the URL):
https://github.com/rust-lang/regex
3rd party relative to the standard library. In other words: not included.
Create a new library, name it as "Standard library", include and reexport al those libraries, profit.
This won't solve supply chain issues.
Linux distributions are built this way. Distro maintainers selects libraries and versions to include, to create solid foundation for apps.
Which still doesn't solve the supply chain issues...
> Procedural macros are often synonymous with slow compile times and code bloat.
In theory they should reduce it because you wouldn’t make proc macros to generate code you don’t need…right? How much coding time you save with macros compared to manually implementing them?
To be fair I think Rust has very healthy selection of options for both, with Serde and Reqwest/Hyper being de-facto standard.
Rust has other challenges it needs to overcome but this isn't one.
I'd put Go behind both C#/F# and Rust in this area. It has spartan tooling in odd areas it's expected to be strong at like gRPC and the serialization story in Go is quite a bit more painful and bare bones compared to what you get out of System.Text.Json and Serde.
The difference is especially stark with Regex where Go ships with a slow engine (because it does not allow writing sufficiently fast code in this area at this moment) where-as both Rust and C# have top of the line implementations in each which beat every other engine save for Intel Hyperscan[0].
[0]: https://github.com/BurntSushi/rebar?tab=readme-ov-file#summa... (note this is without .NET 9 or 10 preview updates)
> (because it does not allow writing sufficiently fast code in this area at this moment)
I don't think that's why. Or at least, I don't think it's straight-forward to draw that conclusion yet. I don't see any reason why the lazy DFA in RE2 or the Rust regex crate couldn't be ported to Go[1] and dramatically speed things up. Indeed, it has been done[2], but it was never pushed over the finish line. My guess is it would make Go's regexp engine a fair bit more competitive in some cases. And aside from that, there's tons of literal optimizations that could still be done that don't really have much to do with Go the language.
Could a Go-written regexp engine be faster or nearly as fast because of the language? Probably not. But I think the "implementation quality" is a far bigger determinant in explaining the current gap.
[1]: https://github.com/golang/go/issues/11646
[2]: https://github.com/matloob/regexp
> At each level a caller might need 5% of the functionality of any given dependency. The deeper the dependency tree gets the more waste piles on. Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call, but all you did was take that one dependency to format a number.
I'm not convinced that happens that often.
As someone working on a Rust library with a fairly heavy dependency tree (Xilem), I've tried a few times to see if we could trim it by tweaking feature flags, and most of the times it turned out that they were downstream of things we needed: Vulkan support, PNG decoding, unicode shaping, etc.
When I did manage to find a superfluous dependency, it was often something small and inconsequential like once_cell. The one exception was serde_json, which we could remove after a small refactor (though we expect most of our users to depend on serde anyway).
We're looking to remove or at least decouple larger dependencies like winit and wgpu, but that requires some major architectural changes, it's not just "remove this runtime option and win 500MB".
Not in Rust, but I've seen it with Python in scientific computing. Someone needs to do some minor matrix math, so they install numpy. Numpy isn't so bad, but if installing it via conda it pulls in MKL, which sits at 171MB right now (although I have memories of it being bigger in the past). It also pulls in intel-openmp, which is 17MB.
Just so you can multiply matrices or something.
> Someone needs to do some minor matrix math, so they install numpy
I’m just not convinced that it’s worth the pain to avoid installing these packages.
You want speedy matrix math. Why would you install some second rate package just because it has a lighter footprint on disk? I want my dependencies rock solid so I don’t have to screw with debugging them. They’re not my core business - if (when) they don’t “just work” it’s a massive time sink.
NumPy isn’t “left pad” so this argument doesn’t seem strong to me.
Because rust is paying the price to compile everything fromch scratch on a release build, you can pay a little extra to turn on link time optimization and turn of parallelism on release builds and absolutely nothing gets compiled in that you don't use, and nothing gets repeated. Also enabling symbols to be stripped can take something with tokio, clap, serde, nalgebra (matrix stuff) and still be 2-5Mb binary. That is still huge to me because I'm old, but you can get it smaller if you want to recompile std along with your other dependencies.
MKL is usually what you want if you are doing matrix math on an Intel CPU.
A better design is to make it easy you to choose or hotswap your BLAS/LAPACK implementation. E.g. OpenBLAS for AMD.
Edit: To be clear, Netlib (the reference implementation) is almost always NOT what you want. It's designed to be readable, not optimized for modern CPUs.
I would argue that BLIS is what you want. It is proper open source and not tied to Intel platforms.
I was very 'impressed' to see multiple SSL libraries pulled into rust software that never makes a network connection.
This is where a) a strong stdlib and b) community consensus on common packages tends to help at least mitigate the problem.
My feeling is that Python scores fairly well in this regard. At least it used to. I haven't been following closely in recent years.
A lot of people dunk on Java, but its standard library is rock solid. It even is backward compatible (mostly).
Did you dig any deeper over which paths that was pulled in?
Symbol culling and dead code removal is already a thing in modern compilers and linkers, and rust can do it too: https://github.com/johnthagen/min-sized-rust
Others have made similar comments, but tree-shaking, symbol culling and anything else that removes dead code after its already been distributed and/or compiled is too late IMO. It's a band-aid on the problem. A useful and pragmatic band-aid today for sure, but it fundamentally bothers me that we have to spend time compiling code and then spend more time to analyze and rip it back out.
Part of the issue I have with the dependency bloat is how much effort we currently go through to download, distribute, compile, lint, typecheck, whatever 1000s of lines of code we don't want or need. I want software that allows me to build exactly as much as I need and never have to touch the things I don't want.
> Others have made similar comments, but tree-shaking, symbol culling and anything else that removes dead code after its already been distributed and/or compiled is too late IMO.
Why, in principle, wouldn't the same algorithms work before distribution?
For that matter, check out the `auditwheel` tool in the Python ecosystem.
As others have pointed out elsewhere, that only removes static dependencies. If you have code paths that are used depending on dynamic function arguments static analysis is unable to catch those.
For example, you have a function calling XML or PDF or JSON output functions depending on some output format parameter. That's three very different paths and includes, but if you don't know which values that parameter can take during runtime you will have to include all three paths, even if in reality only XML (for example) is ever used.
Or there may be higher level causes outside of any analysis, even if you managed a dynamic one. In a GUI, for example, it could be functionality only ever seen by a few with certain roles, but if there is only one app everything will have to be bundled. Similar scenarios are possible with all kinds of software, for example an analysis application that supports various input and output scenarios. It's a variation of the first example where the parameter is internal, but now it is external data not available for an analysis because it will be known only when the software is actually used.
The situation isn't quite as dire as you portray. Compilers these days can also do devirtualization. The consequent static calls can become input to tree shaking in the whole program case. While it's true that we can't solve the problem in general, there's hope for specific cases.
Way back when, I used to vendor all the libraries for a project (Java/Cpp/Python) into a mono repo and integrate building everything into the projects build files so anyone could rebuild the entire app stack with whatever compiler flags they wanted.
It worked great, but it took diligence, it also forces you to interact with your deps in ways that adding a line to a deps file does not.
One nice thing about cargo is that it builds all your code together, which means you can pass a unified set of flags to everything. The feature of building everything all the time as a whole has a bunch of downsides, many which are mentioned elsewhere, but the specific problem of not being able to build dependencies the way you want isn't one.
This is the default way of doing things in the monorepo(s) at Google.
It feels like torture until you see the benefits, and the opposite ... the tangled mess of multiple versions and giant transitive dependency chains... agony.
I would prefer to work in shops that manage their dependencies this way. It's hard to find.
I've never seen a place that does it quite like Google. Is there one? It only works if you have one product or are a giant company as it's really expensive to do.
Being able to change a dependency very deep and recompile the entire thing is just magic though. I don't know if I can ever go back from that.
It's the same that we're doing for external crates in QEMU's experiments with Rust. Each new dependency is added to the build by hand.
It isnt the default way at Google. Just the way in some parts of Google.
It absolutely is so, or was for the 10 years I was there. I worked on Google3 (in Ads, on Google WiFi, on Fiber, and other stuff), in Chromium/Chromecast, Fiber, and on Stadia, and every single one of those repos -- all different repositories -- used vendored deps.
I would absolutely do this for any non-toy project.
Alternatively, for some project it might be enough to only depend on stuff provided by Debian stable or some other LTS distro.
Maven was the one the started the downfall into dependency hell. (Ant as well, but it was harder to blindly include things into it)
Kids today don't know how to do that anymore...
Yet Maven repository is still not that bloated even after 20+ years Java et al. being one of the most popular language.
Compared to Rust where my experience with protobuf lib some time ago was that there is a choice of not 1 but even 3 different libraries, one of which doesn't support services, another didn't support the syntax we had to support, and the third one was unmaintained. So out of 3 choices no single one worked.
Compared that to Maven, where you have only one official supported choice that works well and well maintained.
More time enables more consolidation.
No, there were never several unofficial libraries, one of which eventually won the popularity contest. There was always only one official. There is some barrier to add your project there, so might be that helped.
It's even more pronounced with the main Java competitor: .Net. They look at what approach won in Java ecosystem and go all in. For example there were multiple ORM tools competing, where Microsoft adopted the most popular one. So it's even easier choice there, well supported and maintained.
> Microsoft adopted the most popular one
That's still consolidation, and it also needs time.
Even in Rust crates like hashbrown or parkinglot have been basically subsumed in the standard library.
This works very well until different parts of the deps tree start pulling same Foo with slightly different flags/settings. Often for wrong reasons but sometimes for right ones, and then its new kind of “fun”. Sometimes buildsystem is there to help you but sometimes you are on your own. Native languages like C++ bring special kind of joy called ODR violations to the mix…
> Its super easy to say `import foolib`, then call `foolib.do_thing()` and just start running.
It's effectively an end-run around the linker.
It used to be that you'd create a library by having each function in its own compilation unit, you'd create a ".o" file, then you'd bunch them together in a ".a" archive. When someone else is compiling their code, and they need the do_thing() function, the linker sees it's unfulfiled, and plucks it out of the foolib.a archive. For namespacing you'd probably call the functions foolib_do_thing(), etc.
However, object-orientism with a "god object" is a disease. We go in through a top-level object like "foolib" that holds pointers to all its member functions like do_thing(), do_this(), do_that(), then the only reference the other person's code has is to "foolib"... and then "foolib" brings in everything else in the library.
It's not possible for the linker to know if, for example, foolib needed the reference to do_that() just to initialise its members, and then nobody else ever needed it, so it could be eliminated, or if either foolib or the user's code will somehow need it.
> Go and Rust, for example, encourage everything for a single package/mod to go in the same file.
I can say that, at least for Go, it has excellent dead code elimination. If you don't call it, it's removed. If you even have a const feature_flag = false and have an if feature_flag { foobar() } in the code, it will eliminate foobar().
>At each level a caller might need 5% of the functionality of any given dependency. The deeper the dependency tree gets the more waste piles on. Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call, but all you did was take that one dependency to format a number.
So, what's is the compiler doing that he doesnt remove unused code?
"dependency" here I guess means something higher-level that your compiler can't make the assumption you will never use.
For example you know you will never use one of the main functions in the parsing library with one of the arguments set to "XML", because you know for sure you don't use XML in your domain (for example you have a solid project constraint that says XML is out of scope).
Unfortunately the code dealing with XML in the library is 95% of the code, and you can't tell your compiler I won't need this, I promise never to call that function with argument set to XML.
Why the compiler can't detect it will not be used? Tree shaking is well implemented in Javascript compilers, an ecosystem which extensively suffer from this problem. It should be possible to build a dependency graph and analyze which functions might actually end up in the scope. After all the same is already done for closures.
As poster dead deep in the thread below, something like this can happen
doc_format = get_user_input() parsed_doc = foolib.parse(doc_format)
You as the implementer might know the user will never input xml, so doc_format can't be 'xml' (you might even add some error handling if the user inputs this), but how can you communicate this to the compiler?
That's called bad library design. Rather than a global, make an instantiated parser that takes in specific codecs.
It aint matter, if format is comes from runtime then compiler will not know.
What you're calling "tree shaking" is more commonly called "dead code elimination" in compilers, and is one of the basic optimisations that any production compiler would implement.
A surprising amount of code might be executed in rarely-used or undocumented code paths (for example, if the DEBUG environment variable is 1 or because a plugin is enabled even if not actually used) and thus not shaken out by the compiler.
What makes you think that a lot of code is hidden behind dbg env variable instead of e.g dbg build?
Plenty of libraries have "verbose" logging flags ship way more than assumed. I remember lots of NPM libs that require `winston` for example are runtime-configurable. Or Java libraries that require Log4J. With Rust it's getting hard to remember because everything today seems to pull the fucking kitchen sink...
And even going beyond "debug", plenty of libraries ship features that are downright unwanted by consumers.
The two famous recent examples are Heartbleed and Log4shell.
> Go and Rust, for example, encourage everything for a single package/mod to go in the same file.
Clarification: Go allows for a very simple multi-file. It’s one feature I really like, because it allows splitting otherwise coherent module into logical parts.
Further: I’ve never seen rust encourage anything of the sort. Module directory with a mod.rs and any number of files works just fine.
I probably mischaracterized this as its been a while since I did more than trivial Rust. AFAIK its not possible to depend on only a part of a module in Rust though right? (At least without an external build system)
For example, you can't split up a module into foo.rs containing `Foo` and bar.rs containing `Bar`, both in module 'mymod' in such a way that you can `use mymod::Bar and foo.rs is never built/linked.
My point is the granularity of the package/mod encourages course-grained deps, which I argue is a problem.
You'd use feature flags to enable certain parts of the library.
> not possible to depend on only a part of a module in Rust though right
yesn't, you can use feature flags similar to `#if` in C
but it's also not really a needed feature as dead code elimination will prune out all code functions, types, etc. you don't use. Non of it will end up in the produced binary.
Yeah, likewise Rust is completely fine after you say `mod foo` and have a file named foo.rs, if you also make a foo/ directory and put foo/whatever.rs and foo/something_else.rs that those are all part of the foo module.
Historically Rust wanted that foo.rs to be renamed foo/mod.rs but that's no longer idiomatic although of course it still works if you do that.
to extend on this:
in rust crates are semantically one compilation unit (where in C oversimplified it's a .h/.c pair, and practically rustc will try to split it in some more units to speed up build time).
the reason I'm pointing this out is because many sources of "splitting a module across files" come from situations where 1 file is one compilation unit so you needed to have a way to split it (for organization) without splitting it (for compilation) in some sitation
Not just multiple files, but multiple directories. One versioned dependency (module) usually consists of dozens of directories (packages) and dozens to hundreds of files. Only newcomers from other languages create too many go.mod files when they shouldn't.
This idea is already implemented in Dotnet, with Trimming and now ahead of time compilation (AOT). Maybe other languages can learn from dotnet?
https://learn.microsoft.com/en-us/dotnet/core/deploying/trim...
https://learn.microsoft.com/en-us/dotnet/core/deploying/nati...
dead code elimination is a very old shoe
which get reinvented all the time, like in dotnet with "trimming" or in JS with "tree-shaking".
C/C++ compiler have been doing that since before dot net was a thing, same for rust which does that since it's 1.0 release (because it's done by LLVM ;) )
The reason it gets reinvented all the time is because while it's often quite straight forward in statically compiled languages it isn't for dynamic languages as finding out what actually is unused is hard (for fine grained code elimination) or at lest unreliable (pruning submodules). Even worse for scripting languages.
Which also brings use to one area where it's not out of the box, if you build .dll/.so in one build process and then use them in another. Here additional tooling is needed to prune the dynamic linked libraries. But luckily it's not a common problem to run into in Rust.
In general most code size problems in Rust aren't caused by too huge LOC of dependencies but by an overuse of monopolization. The problem of tons of LOC in dependencies is one of supply chain trust and review ability more then anything else.
> The reason it gets reinvented all the time is because while it's often quite straight forward in statically compiled languages it isn't for dynamic languages as finding out what actually is unused is hard (for fine grained code elimination) or at lest unreliable (pruning submodules). Even worse for scripting languages.
It seems to me in a strict sense the problem of eliminating dead code may be impossible for code that uses some form of eval(). For example, you could put something like eval(decrypt(<encrypted code>,key)), for a user-supplied key (or otherwise obfuscated); or simply eval(<externally supplied code>); both of which could call previously dead code. Although it seems plausible to rule out such cases. Without eval() some of the problem seems very easy otoh, like unused functions can simply be removed!
And of course there are more classical impediments, halting-problem like, which in general show that telling if a piece of code is executed is undecidable.
( Of course, we can still write conservative decisions that only cull a subset of easy to prove dead code -- halting problem is indeed decidable if you are conservative and accept "I Don't Know" as well as "Halts" / "Doesn't Halt" :) )
Yes, even without Eval, there's a ton of reflective mechanisms in JS that are technically broken by dead code elimination (and other transforms, like minification), but most JS tools make some pretty reasonable assumptions that you don't use these features. For example, minifiers assume you don't rely on specific Function.name property being preserved. Bundlers assume you don't use eval to call dead code, too.
reflective code is evil.
> In general most code size problems in Rust aren't caused by too huge LOC of dependencies but by an overuse of monopolization
*monomorphization, in case anyone got confused
And here I thought that Rust already killed unions
Those are done at compile time. Many languages (including Rust, which this story is about) also remove unused symbols at compile time.
The comment you're replying to is talking about not pulling in dependencies at all, before compiling, if they would not be needed.
> Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call,
It’s getting hard to take these conversations seriously with all of the hyperbole about things that don’t happen. Nobody is producing Rust binaries that hit 500MB or even 50MB from adding a couple simple dependencies.
You’re also not ending up with mountains of code that never gets called in Rust.
Even if my Rust binaries end up being 10MB instead of 1MB, it doesn’t really matter these days. It’s either going on a server platform where that amount of data is trivial or it’s going into an embedded device where the few extra megabytes aren’t really a big deal relative to all the other content that ends up on devices these days.
For truly space constrained systems there’s no-std and entire, albeit small, separate universe of packages that operate in that space.
For all the doom-saying, in Rust I haven’t encountered this excessive bloat problem some people fret about, even in projects with liberal use of dependencies.
Every time I read these threads I feel like the conversations get hijacked by the people at the intersection of “not invented here” and nostalgia for the good old days. Comments like this that yearn for the days of buying paid libraries and then picking them apart anyway really reinforce that idea. There’s also a lot of the usual disdain for async and even Rust itself throughout this comment section. Meanwhile it feels like there’s an entire other world of Rust developers who have just moved on and get work done, not caring for endless discussions about function coloring or rewriting libraries themselves to shave a few hundred kB off of their binaries.
I agree on the bloat, considering my rust projects typically don't use any shared libraries other than a libc a few Mb for a binary including hundreds of crates in dependencies (most pf which are part of rustc or cargo itself), doesn't seem so bad. I do get the asynch thing. It just isn't the right tool for most of my needs. Unless you are in the situation where you need to wait faster (for connections usually) threads are better for trying to compute faster than asynch is.
I don't think libraries are the problem, but we don't have a lot of visibility after we add a new dependency. You either take the time to look into it, or just add it and then forget about the problem (which is kind of the point of having small libraries).
It should be easy to build and deploy profiling-aware builds (PGO/BOLT) and to get good feedback around time/instructions spent per package, as well as a measure of the ratio of each library that's cold or thrown away at build time.
I agree that I don't like thinking of libraries as the problem. But they do seem to be the easiest area to point at for a lot of modern development hell. Is kind of crazy.
I'll note that it isn't just PGO/BOLT style optimizations. Largely, it is not that at all, oddly.
Instead, the problem is one of stability. In a "foundation that doesn't move and cause you to fall over" sense of the word. Consider if people made a house where every room had a different substructure under it. That, largely, seems to be the general approach we use to building software. The idea being that you can namespace a room away from other rooms and not have any care on what happens there.
This gets equally frustrating when our metrics for determining the safety of something largely discourages inaction on any dependencies. They have to add to it, or people think it is abandoned and not usable.
Note that this isn't unique to software, mind. Hardware can and does go through massive changes over the years. They have obvious limitations that slow down how rapidly they can change, of course.
> Instead, the problem is one of stability. In a "foundation that doesn't move and cause you to fall over" sense of the word. Consider if people made a house where every room had a different substructure under it. That, largely, seems to be the general approach we use to building software. The idea being that you can namespace a room away from other rooms and not have any care on what happens there.
I'm not sure what the problem is here.
Are you after pinning dependencies to be sure they didn't change? Generally I want updating dependencies to fix bugs in them.
Are you after trusting them through code review or tests? I don't think there's shortcuts for this. You shouldn't trust a library, changing or not, because old bugs and new vulnerabilities make erring on both sides risky. On reviewing other's code, I think Rust helps a bit by being explicit and fencing unsafe code, but memory safety is not enough when a logic bug can ruin your business. You can't avoid testing if mistakes or crashes matter.
> I'll note that it isn't just PGO/BOLT style optimizations. Largely, it is not that at all, oddly.
Well, it's not required to trim code that you can prove unreachable, true. But I was thinking about trying to measure if a given library really pulls it's non-zero weight, and how much CPU is spent in it.
A library taking "too much time" for something you think can be done faster might need replacement, or swapping for a simple implementation (say the library cares about edge cases you don't face or can avoid).
> It's a terrible idea...
It's a terrible idea because you're trying to reinvent section splitting + `--gc-sections` at link time, which rust (which the article is about) already does by default.
The article is about Rust, but I was commenting on dependencies in general.
Things like --gc-sections feels like a band-aid, a very practical and useful band-aid, but a band-aid none the less. You're building a bunch of things you don't need, then selectively throwing away parts (or selectively keeping parts).
IMO it all boils down to the granularity. The granularity of text source files, the granularity of units of distribution for libraries. It all contributes to a problem of large unwieldy dependency growth.
I don't have any great solutions here, its just observations of the general problem from the horrifying things that happen when dependencies grow uncontrolled.
As far as I'm aware, LTO completely solves this from a binary size perspective. It will optimise out anything unused. You can still get hit from a build time perspective though.
"completely solves" is a bit of an overstatement. Imagine a curl-like library that allows you to make requests by URL. You may only ever use HTTP urls, but code for all the other schemas (like HTTPS, FTP, Gopher) needs to be compiled in as well.
This is an extreme example, but the same thing happens very often at a smaller scale. Optional functionality can't always be removed statically.
That only applies when dynamic dispatch is involved and the linker can't trace the calls. For direct calls and generics(which idiomatic Rust code tends to prefer over dyn traits) LTO will prune extensively.
So what happens if the user passes an url containing ftp:// or even https:// to stdin? Or is this an HTTP only library?
Depends on what is desired, in this case it would fail (through the `?`), and report it's not a valid HTTP Uri. This would be for a generic parsing library that allows for multiple schemes to be parsed each with their own parsing rules.
If you want to mix schemes you would need to be able to handle all schemes; you can either go through all variations (through the same generics) you want to test or just just accept that you need a full URI parser and lose the generic.
If you want to mix schemes you should just mix schemes.
See, the trait system in Rust actually forced you to discover your requirements at a very core level. It is not a bug, but a feature. If you need HTTPS, then you need to include the code to do HTTPS of course. Then LTO shouldn't remove it.
If your library cannot parse FTP, either you enable that feature, add that feature, or use a different library.
No, this wouldn't work. The type of the request needs to be dynamic because the user can pass in any URI.
Then they can also pass in an erroneous URI. You still need some way to deal with the ones you're not accepting.
I guess that depends on the implementation. If you're calling through an API that dynamically selects the protocol than I guess it wouldn't be removable.
Rust does have a feature flagging system for this kind of optional functionality though. It's not perfect, but it would work very well for something like curl protocol backends though.
That's a consequence of crufty complicated protocols and standards that require a ton of support for different transports and backward compatibility. It's hard to avoid if you want to interoperate with the whole world.
yes, it's not a issue of code size but a issue of supply chain security/reviewability
it's also not always a fair comparison, if you include tokio in LOC counting then you surely would also include V8 LOC when counting for node, or JRE for Java projects (but not JDK) etc.
And, reductio ad absurdum, you perhaps also need to count those 27 million LOC in Linux too. (Or however many LOC there are in Windows or macOS or whatever other OS is a fundamental "dependency" for your program.)
Or you could use APE and then all of those LOC go away. APE binaries can boot metal, and run on the big 3 OS from the same file.
It's certainly better than in Java where LTO is simply not possible due to reflection. The more interesting question is which code effectively gets compiled so you know what has to be audited. That is, without disassembling the binary. Maybe debug information can help?
Not only it is possible, it has been available for decades on commercial AOT compilers like Aonix, Excelsior JET, PTC, Aicas.
It is also done on the cousin Android, and available as free beer on GraalVM and OpenJ9.
Those all break compatibility to achieve that.
No they don't, PTC, Aicas, GraalVM and OpenJ9 support reflection.
The others no longer matter, out of business.
You can't LTO code under the presence of reflection. You can AOT but there will always be a "cold path" where you have to interpret whatever is left.
Yet it works, thanks to additional metadata, either in dynamic compiler which effectly does it in memory, throwing away execution paths with traps to redo when required, and with PGO like metadata for AOT compilation.
And since we are always wrong unless proven otherwise,
https://www.graalvm.org/jdk21/reference-manual/native-image/...
https://www.graalvm.org/latest/reference-manual/native-image...
You do understand that the topic at hand is not shipping around all that code needed to support a trap, right?
In Go, the symbol table contains enough information to figure this out. This is how https://pkg.go.dev/golang.org/x/vuln/cmd/govulncheck is able to limit vulnerabilities to those that are actually reachable in your code.
The symbol table might contain reflection metadata, but it surely can't identify what part of it will be used.
It's possible and in recent years the ecosystem has been evolving to support it much better via native-image metadata. Lots of libraries have metadata now that indicates what's accessed via reflection and the static DCE optimization keeps getting better. It can do things like propagate constants to detect more code as dead. Even large server frameworks like Micronaut or Spring Native support it now.
The other nice thing is that bytecode is easy to modify, so if you have a library that has some features you know you don't want, you can just knock it out and bank the savings.
Doesn’t Java offer some sort of trimming like C#? I know he won’t remove everything but at least they can trim down a lot of things.
Yes, jlink, code guard, R8/D8 on Android, if you want to stay at the bytecode level, plus all the commercial AOT compilers and the free beer ones, offer similar capabilities at the binary level.
Everywhere in this thread is debating whether LTO "completely" solves this or not, but why does this even need LTO in the first place? Dead code elimination across translation units in C++ is traditionally accomplished by something like -ffunction-sections, as well as judiciously moving function implementations to the header file (inline).
Clang also supports virtual function elimination with -fvirtual-function-elimination, which AFAIK currently requires full LTO [0]. Normally, the virtual functions can't be removed because the vtable is referencing them. It's very helpful in cutting down on bloat from our own abstractions.
[0] https://clang.llvm.org/docs/ClangCommandLineReference.html#c...
> As far as I'm aware, LTO completely solves this from a binary size perspective.
I wouldn't say completely. People still sometimes struggle to get this to work well.
Recent example: (Go Qt bindings)
https://github.com/mappu/miqt/issues/147
LTO only gets you so far, but IMO its more kicking the can down the road.
The analogy I use is cooking a huge dinner, then throwing out everything but the one side dish you wanted. If you want just the side-dish you should be able to cook just the side-dish.
I see it more as having a sizable array of ingredients in the pantry, and using only what you need or want for a given meal.
Then another group of armchair programmers will bitch you out for using small dependencies
I just don't listen. Things should be easy. Rust is easy. Don't overthink it
Some of that group of armchair programmers remember when npm drama and leftpad.js broke a noticeable portion of the internet.
Sure, don't overthink it. But underthinking it is seriously problematic too.
LTO gets a lot of the way there, but it won't for example help with eliminating unused enums (and associated codepaths). That happens at per-crate MIR optimisation iirc, which is prior to llvm optimisation of LTO.
The actual behavior of go seems much closer to your ideal scenario than what you attribute to it. Although it is more nuanced, so both are true. In go, a module is a collection of packages. When you go get a module, the entire module is pulled onto the host, but when you vendor only the packages you use (and i believe only the symbols used from that package, but am not certain) are vendored to your module as dependencies.
There's an interesting language called Unison, which implements part of this idea (the motivation is a bit different, though)
Functions are defined by AST structure and are effectively content addressed. Each function is then keyed by hash in a global registry where you can pull it from for reuse.
> In some cases the languages make this worse. Go and Rust, for example, encourage everything for a single package/mod to go in the same file.
What? I don't know about Go, but this certainly isn't true in Rust. Rust has great support for fine-grained imports via Cargo's ability to split up an API via crate features.
[flagged]
> At each level a caller might need 5% of the functionality of any given dependency.
I think that is much more of a problem in ecosystems where it is harder to add dependencies.
When it is difficult to add dependencies, you end up with large libraries that do a lot of stuff you don't need, so you only need to add a couple of dependencies. On the other hand, if dependency management is easy, you end up with a lot of smaller packages that just do one thing.
> The only real solution I can think of to deal with this long term is ultra-fine-grained symbols and dependencies. Every function, type, and other top-level language construct needs to declare the set of things it needs to run (other functions, symbols, types, etc). When you depend on that one symbol it can construct, on demand, the exact graph of symbols it needs and dump the rest for any given library. You end up with the minimal set of code for the functionality you need.
Or you have ultra-fine-grained modules, and rely on existing tree-shaking systems.... ?
If you think about it, every function already declares what it needs simply by actually using it. You know if a function needs another function because it calls it. So what exactly are you asking? That the programmer insert a list of dependent functions in a comment above every function? The compiler could do that for you. The compiler could help you and go up a level and insert the names of modules the functions belong to?
My understanding is that the existing algorithms for tree shaking (dead code elimination, etc. etc. whatever you want to call it) work exactly on that basis. But Python is too dynamic to just read the source code and determine what's used ahead of time. eval and exec exist; just about every kind of namespace is reflected as either a dictionary or an object with attributes, and most are mutable; and the import system works purely at runtime and has a dazzling array of hooks.
The late Joe Armstrong had an idea about open source that it should be just a collection of functions that we publish. It would solve this problem.
> The only real solution I can think of to deal with this long term is ultra-fine-grained symbols and dependencies. Every function, type, and other top-level language construct needs to declare the set of things it needs to run (other functions, symbols, types, etc). When you depend on that one symbol it can construct, on demand, the exact graph of symbols it needs and dump the rest for any given library.
That’s literally the JS module system? It’s how we do tree shaking to get those bundle sizes down.
As many others as mentioned, "tree shaking" is just a rebranded variation of dead code elimination which is a very old idea. I don't think JS does what OP is suggesting anyway, you certainly don't declare the exact dependencies of each function.
Small libraries are nice to reduce stuff, but are npm's isEven, isOdd and leftpad really the right solution? - Instead of a bunch of small libraries maintained by many individual maintainers I'd prefer a larger lib maintained by a group, where continuacy is more likely and different parts work together.
I am just a college student, so sorry if this is stupid, but we know that Rust compiler can detect unused code, variables, functions and all, as can IDE's for all languages, then why don't we just remove those parts? The unused code is just not compiled.
Mainly because in some libs some code is activated at runtime.
A lot of the bloat comes from functionality that can be activated via flags, methods that set a variable to true, environment variables, or even via configuration files.
When talking about LTO we don't expect it to be removing code used in runtime. Such code is not dead code, by definition.
If you want to disable certain runtime features, you'd do so with feature flags.
Sure, but I'm talking about bloat in libraries that don't get LTO'd. If there are no feature flags and no plugin functionality, LTO can't do its job. There are plenty of non-core libraries like this.
OTOH it also depends on the architecture you build. If you have a local-first thick client the initial install of 800 MB is less relevant if after install you communicate on a tightly controlled (by you) p2p networking stack, but take on heavy dependencies in the UI layer to provide you e.g. infinite collaborative canvas based collaboration and diagramming.
This has been the #1 way to achieve code re-use and I am all for it. Optimize it in post where it is necessary and build things faster with tested code.
> In the 80s the idea of a library of functionality was something you paid for, and painstakingly included parts of into your size constrained environment (fit it on a floppy). You probably picked apart that library and pulled the bits you needed, integrating them into your builds to be as small as possible.
If anything, the 1980s is when the idea of fully reusable, separately-developed software components first became practical, with Objective-C and the like. In fact it's a significant success story of Rust that this sort of pervasive software componentry has now been widely adopted as part of a systems programming language.
You're talking about different 80s. On workstations and Unix mainframes, beasts like Smalltalk and Objective C roamed the Earth. On home computers, a resident relocatable driver that wasn't part of ROM was an unusual novelty.
Yeah, 1990s is more accurate. There was a huge market for COM controls and widget libs and a lot of that Obj-C stuff came with a price tag.
Size issues and bloat can be solved by tree shaking which is orthogonal to granularity of the package ecosystem. It doesn't matter for server side (at least people don't care). On client side, most ecosystems have a way to do it. Dart does it. Android does it with proguard.
The more pressing issue with dependencies is supply chain risks including security. That's why larger organizations have approval processes for using anything open source. Unfortunately the new crop of open source projects in JS and even Go seem to suffer from "IDGAF about what shit code from internet I am pulling" syndrome.
Unfortunately granularity does not solve that as long as your 1000 functions come from 1000 authors on NPM.
A consideration that is often overlooked is that the waste accumulates exponentially!
If each layer of “package abstraction” is only 50% utilised, then each layer multiplies the total size by 2x over what is actually required by the end application.
Three layers — packages pulling in packages that pull their own dependencies — already gets you to 88% bloat! (Or just 12% useful code)
An example of this is the new Windows 11 calculator that can take several seconds to start because it loads junk like the Windows 10 Hello for Business account recovery helper library!
Why? Because it has currency conversion, which uses a HTTP library, which has corporate web proxy support, which needs authentication, which needs WH4B account support, which can get locked out, which needs a recovery helper UI…
…in a calculator. That you can’t launch unless you have already logged in successfully and is definitely not the “right place” for account recovery workflows to be kicked off.
But… you see… it’s just easier to package up these things and include them with a single line in the code somewhere.
if only we had a system that we could all operate on with a standard set of tools that would take care of shared resource access like this.
Dead code elimination means binary size bloat does not follow from dependency bloat. So this point is pretty much invalid for a compiled language like Rust.
Dead code elimination is exactly the same as the halting problem. It’s approximate (and hopefully conservative!) at best.
No, dead code elimination in a statically-dispatched language is not equivalent to the halting problem.
I can't remember the last time I saw someone so conclusively demonstrate they know nothing about the basics of how libraries, compilers, and linkers work.
Agreed it’s a problem and I can’t propose a solution other than something you’ve suggested which is referencing functions by their value (tldr hashing them) kinda like what Unison(?) proposes.
But I think the best defense against this problem at the moment is to be extremely defensive/protective of system dependencies. You need to not import that random library that has a 10 line function. You need to just copy that function into your codebase. Don’t just slap random tools together. Developing libraries in a maintainable and forward seeking manner is the exception not the rule. Some ecosystems exceed here, but most fail. Ruby and JS is probably one of the worst. Try upgrading a Rails 4 app to modern tooling.
So… be extremely protective of your dependencies. Very easy to accrue tech debt with a simple library installation. Libraries use libraries. It becomes a compounding problem fast.
Junior engineers seem to add packages to our core repo with reckless abandon and I have to immediately come in and ask why was this needed? Do you really want to break prod some day because you needed a way to print a list of objects as a table in your cli for dev?
I'm curious if rust has this problem. The problem I notice in npm land is many developers have no taste. Example, there's a library for globbing call glob. You'd think it would just be a function that does globbing but no, the author decided it should ALSO be a standalone commandline executable and so includes a large commandline option parser. They could have easily made a separate commandline tool that include a library that does the glob but no, this is a common and shit pattern in npm. I'd say easily 25% or more of all "your dependencies are out of date" messages are related to the argument parcing for the commandline tool in these libraries. That's just one example.
Also there's arguably design. Should a 'glob' library actually read the file system and give you filenames or should it just tell you if a string matches a glob and leave the reset to you? I think it's better design to do the later, the simplest thing. This means less dependencies and more flexibility. I don't have to hack it or add option to use my own file system (like for testing). I can use it with a change monitoring system, etc...
And, I'm sure there are tons of devs that like the glob is a "Do everything for me" library instead of a "do one specific thing" library which makes it worse because you get more "internet points" the more your library doesn't require the person using it to be a good dev.
I can't imagine it's any different in rust land, except maybe for the executable thing. There's just too many devs and all of them, including myself, don't always make the best choices.
> Should a 'glob' library actually read the file system and give you filenames
The POSIX glob function after which these things are named traverses the filesystem and matches directory entries.
The pure matching function which matches a glob pattern against a filename-like string is fnmatch.
But yes, the equivalent of fnmatch should be a separate module and that could be a dependency of glob.
Nobody should be trying to implement glob from scratch using a fnmatch-like function and directory traversal. It is not so trivial.
glob performs a traversal that is guided by the pattern. It has to break the pattern into path components. It knows that "*/*/*" has three components and so the traversal will only go three levels deep. Also "dir/*" has a component which is a fixed match, and so it just has to open "dir" without scanning the current directory; if that fails, glob has failed.
If the double star ** is supported which matches multiple components, that's also best if it likewise integrated into glob.
If brace expansion is supported, that adds another difficulty because different branches of a brace can have different numbers of components, like {*/x,*/*/x,*/*/*/x}. To implement glob, it would greatly help us to have brace expansion as a separate function which expands the braces, producing multiple glob patterns, which we can then break into path components and traverse.
They eventually fixed it but grunt once upon a time used a glob implementation that could not short circuit on wildcards in ignore patterns. So I caught it scanning the node-modules directory and then dropping every file it found because it matched on “node_modules/**”. Builds got a lot faster when I pushed that update out.
There’s a lot of stupid ways to implement glob and only a couple of smart ones.
> But yes, the equivalent of fnmatch should be a separate module and that could be a dependency of glob.
Interesting, lets look at fnmatch: https://pubs.opengroup.org/onlinepubs/9699919799/functions/f...
Well, fnmatch really does two things, it parses the pattern and then applies that to a string, so really, there should be a "ptnparse" library that handles the pattern matching that fnmatch has a dependency.
Though, thinking it through, the "ptnparse" library is responsible for patterns matching single characters and multiple characters. We should split that up into "singleptn" and "multiptn" libraries that ptnparse can take as dependencies.
Oh, and those flags that fnmatch takes makes fnmatch work in several different ways, let's decompose those into three libraries so that we only have to pull in the matcher we care about: pthmatch, nscmatch, and prdmatch. Then we can compose those libraries based on what we want in fnmatch.
This is perfect, now if we don't care about part of the fnmatch functionality, we don't have to include it!
/s
This decomposition is how we wind up with the notorious leftpad situation. Knowing when to stop decomposing is important. fnmatch is a single function that does less than most syscalls. We can probably bundle that with a few more string functions without actually costing us a ton. Glob matching at a string level probably belongs with all the other string manipulation functions in the average "strings" library.
Importantly, my suggestion that fnmatch belongs in a "strings" library does align with your suggestion that fnmatch shouldn't be locked into a "glob" library that also includes the filesystem traversal components.
> I can't imagine it's any different in [R]ust land
Taste is important; programmers with good architectural taste tend to use languages that support them in their endeavour (like Rust or Zig) or at least get out of the way (C).
So I would argue the problems you list are statistically less often the case than in certain other languages (from COBOL to JavaScript).
> There's just too many devs and all of them, including myself, don't always make the best choices.
This point you raise is important: I think an uncoordinated crowd of developers will create a "pile of crates" ("bazaar" approach, in Eric Raymond's terminology), and a single language designer with experience will create a more uniform class library ("cathedral" approach).
Personally, I wish Rust had more of a "batteries included" standard library with systematically named and namespaced official crates (e.g. including all major data structures) - why not "stdlib::data_structures::automata::weighted_finite_state_transducer" instead of a confusing set of choices named "rustfst-ffi", "wfst", ... ?
Ideally, such a standard library should come with the language at release. But the good news is it could still be devised later, because the Rust language designers were smart enough to build versioning with full backwards compatibility (but not technical debt) into the language itself. My wish for Rust 2030 would be such a stdlib (it could even be implemented using the bazaar of present-day crates, as long as that is hidden from us).
We don't need to speak in hypotheticals, we can just look at the glob crate: https://crates.io/crates/glob
213M downloads, depends on zero external crates, one source file (a third of which is devoted to unit tests), and developed by the rust-lang organization itself (along with a lot of crates, which is something that people tend to miss in this discussion).
Which glob crate? https://crates.io/search?q=glob
I went to page 8 and there were still glob libraries.
The one that shows up first, which is to say, the one with 200 million downloads, which is to say, the one whose name is the exact match for the search query.
That's much more a statement about the search function on crates.io than it is the number of glob crates. I think if you have the standard glob crate as a dependency you show up in that search.
glob was just an example. They weren't asking about a specific crate.
Also this crate is from official rust lang repo, so much less prone to individualistic misbehaving. A bad example all around.
> Also this crate is from official rust lang repo, so much less prone to individualistic misbehaving.
To reiterate, lots of things that people in this thread are asking the language to provide are in fact provided by the rust-lang organization: regex, serde, etc. The goalposts are retreating over the horizon.
Rust's primary sin here is that it makes dependency usage transparent to the end-user. Nobody wants to think about how many libraries they depend upon and how many faceless people it takes to maintain those libraries, so they're uncomfortable when Rust shows you. This isn't a Rust problem, it's a software complexity problem.
I think the parent was suggesting comparing and contrasting the glob dependency in rust, and npm. The one off isn't useful, but picking ten random, but heavily used packages probably is. The parent didn't really mention what the node version looked like though.
The npm glob package has 6 dependencies (those dependencies have 3+ dependencies, those sub dependencies have 6+ dependencies, ...)
As you point out the rust crate is from the official repo, so while it's not part of the standard library, it is maintained by the language maintenance organization.
Maybe that could make it a bad example, but the npm one is maintained by the inventor of npm, and describes him self as "I wrote npm and a pretty considerable portion of other node related JavaScript that you might use.", so I would say that makes it a great example because the people who I would expect care the most about the language are the package maintainers of these packages, and are (hopefully) implementing what they think are the best practices for the languages, and the eco-systems.
Finding a single library that avoids the problem is pretty useless. You can find great libraries in Node as well but everyone would agree that Node has a dependency problem.
And yet it's telling that, when the author mused about library quality and unknowingly suggested an arbitrary library as an example, the Rust version turned out to be high quality.
Historically, the borrow checker has been a good shield against developers that have no taste.
Not sure how long that’ll last.
Dynamically typed languages do the opposite.
Not really, there are plenty of large libraries today that were designed by complete boneheads. Granted, you only notice if you know that domain very well.
macro seems to compensate for that. there's definitely a "C++ template programming" vibe in some libraries.
It's worth pointing out that Node has a built in globbing function: https://nodejs.org/docs/latest-v24.x/api/fs.html#fspromisesg...
> Also there's arguably design. Should a 'glob' library actually read the file system and give you filenames or should it just tell you if a string matches a glob and leave the reset to you?
There's a function in Node's stdlib that does this as well (albeit it's marked as experimental): https://nodejs.org/docs/latest-v24.x/api/path.html#pathmatch...
Bun has a glob too https://bun.sh/docs/api/glob
What you’re describing regarding glob is not lack of taste, it’s an architectural “bug”.
Taste is what Steve Jobs was referring to when he said Microsoft had none. In software it’s defined by a humane, pleasant design that almost(?) anybody can appreciate.
Programming languages cannot be tasteful, because they require time and effort to learn and understand. Python has some degree elegance and Golang’s simplicity has a certain je ne sais quoi… but they’re not really fitting the definition.
Still, some technologies such as git, Linux or Rust stand out as particularly obscure even for the average developer, not just average human.
> The problem I notice in npm land is many developers have no taste.
Programming is not the same as hanging out some hoity-toity art gallery. If someone critiqued my software dev by saying I had "no taste", I'd cringe so hard I'd turn into a black hole.
I know this is hackernews, but this reeks of self-importance.
Engineering is a form of art where the engineer makes many decisions, large and small, where optimality cannot be proven. Taste most certainly plays a role, and there are engineering products that clearly show good or poor taste.
Unfortunately this particular art form requires fluency in mathematics and the sciences/computers, so it’s very inaccessible.
> Taste most certainly plays a role, and there are engineering products that clearly show good or poor taste.
No. Get over yourself.
Case in point.
Imagine if a carpenter or house builder was shitting out slop that had no taste. And then laughed at people who pointed it out. Would you hire them to build something for you?
This is a problem with SE culture.
Actually, the problem with SE culture is people think they're way smarter than they really are simply because they grew up being called a genius by knowing how to turn a computer on and off.
> Imagine if a carpenter or house builder was shitting out slop that had no taste
What are you even talking about? How does this even remotely relate to software development? Are you telling me a function that adds 2 numbers has "taste"?
Yeah that's one huge advantage Rust has over NPM - Rust developers are a lot more skilled and crates are generally much higher quality.
Random remark: I've noticed the quality of rust libraries too. Which made me really surprised to see the overengineered mess that is the async-openai crate.
how can one take an api as simple as openai's one, and turn it to this steaming pile of manure ? in the end, i used reqwest and created my queries manually. I guess that's what everyone does...
The kinds of people who think OpenAI's tech is worth touching with a bargepole are generally not the kinds of people who develop and maintain high-quality Rust libraries.
i get it, but openai being the hotest stuff happening in software for the past.. x years , i would have assumed there was some kind of official client correctly maintained for rust.
I was a bit shocked to be honest.
edit : i originally misread your comment. OpenAI is an important tech, no matter what you think of the company itself. Being able to easily interface with their api is important.
Maybe that were true back when Rust wasn't mainstream on social media nor across tech influencer videos, but it's not true anymore.
https://crates.io/search?q=is-even
Oh no, people in the Rust community make jokes, how unprofessional!!111
You'll notice these packages are not actually used by anything.
That's a joke. Leftpad wasn't.
A true enough statement, but "Rust" is unnecessarily specific. Dependencies are getting scary in general. Supply chain attacks are no longer hypothetical, they're here and have been for a while.
If I were designing a new language I think I'd be very interested in putting some sort of capability system in so I can confine entire library trees safely, and libraries can volunteer somehow what capabilities they need/offer. I think it would need to be a new language if for no other reason than ecosystems will need to be written with the concept in them from the beginning.
For instance, consider an "image loading library". In most modern languages such libraries almost invariably support loading images from a file, directly, for convenience if nothing else. In a language that supported this concept of capabilities it would be necessary to support loading them from a stream, so either the image library would need you to supply it a stream unconditionally, or if the capability support is more rich, you could say "I don't want you to be able to load files" in your manifest or something and the compiler would block the "LoadFromFile(filename)" function at compile time. Multiply that out over an entire ecosystem and I think this would be hard to retrofit. It's hugely backwards incompatible if it is done correctly, it would be a de facto fork of the entire ecosystem.
I honestly don't see any other solution to this in the long term, except to create a world where the vast majority of libraries become untargetable in supply chain attacks because they can't open sockets or read files and are thus useless to attackers, and we can reduce our attack surface to just the libraries that truly need the deep access. And I think if a language came out with this design, you'd be surprised at how few things need the dangerous permissions.
Even a culture of minimizing dependencies is just delaying the inevitable. We've been seeing Go packages getting supply-chain-attacked and it getting into people's real code bases, and that community is about as hostile to large dependency trees as any can be and still function. It's not good enough.
You want a special purpose language.
In your particular example of image loading, you want WUFFS. https://github.com/google/wuffs
In WUFFS most programs are impossible. Their "Hello, world" doesn't print hello world because it literally can't do that. It doesn't even have a string type, and it has no idea how to do I/O so that's both elements of the task ruled out. It can however, Wrangle Untrusted File Formats Safely which is its sole purpose.
I believe there should be more special purpose languages like this, as opposed to the General Purpose languages most of us learn. If your work needs six, sixteen or sixty WUFFS libraries to load different image formats, that's all fine because categorically they don't do anything outside their box. Yet, they're extremely fast because since they can't do anything bad by definition they don't need those routine "Better not do anything bad" checks you'd write in a language like C or the compiler would add in a language like Rust, and because they vectorize very nicely.
Java and the .NET Framework had partial trust/capabilities mechanisms decades ago. No one really used them and they were deprecated/removed.
It was not bad, but without memory/cpu isolates, it was pretty useless. The JSR for isolation got abandoned when Sun went belly up.
It was more like no one used them correctly.
Wouldn't that mean they were poorly implemented. If no one uses something correctly, seems like that isn't a problem with the people but the thing.
I don't think so. Software is maybe the only "engineering" discipline where it is considered okay to use mainstream tools incorrectly and then blame the tools.
Do the “mainstream” tools change every five years in other disciplines?
To be fair they only change when chasing trends, consumers don't care how software is written, provided it does the job.
Which goes both ways, it can be a Gtk+ application written in C, or Electron junk, as long as it works, they will use it.
Partially yes, hence why they got removed, the official messaging being OS security primitives are a better way.
Maybe we need a stronger culture of Sans-IO dependencies in general. To the point of pointing out and criticising like it happens with bad practices and dark patterns. A new lib (which shouldn't be used its own file access code) is announced in HN, and the first comment: "why do you do your own IO?"
Edit - note it's just tongue in cheek. Obviously libraries being developed against the public approval wouldn't be much of a good metric. Although I do agree that a bit more common culture of the Sans-IO principles would be a good thing.
I don't think retrofitting existing languages/ecosystems is necessarily a lost cause. Static enforcement requires rewrites, but runtime enforcement gets you most of the benefit at a much lower cost.
As long as all library code is compiled/run from source, a compiler/runtime can replace system calls with wrappers that check caller-specific permissions, and it can refuse to compile or insert runtime panics if the language's escape hatches would be used. It can be as safe as the language is safe, so long as you're ok with panics when the rules are broken.
It'd take some work to document and distribute capability profiles for libraries that don't care to support it, but a similar effort was proven possible with TypeScript.
I actually started working on a tool like that for fun, at each syscall it would walk back up the stack and check which shared object a function was from and compare that to a policy until it found something explicitly allowed or denied. I don't think it would necessarily be bulletproof enough to trust fully but it was fun to write.
I love this idea. There is some reminiscence of this in Rust, but it's opt in and based on convention, and only for `unsafe` code. Specifically, there's a trend of libraries using `#![deny(unsafe_code)]` (which will cause a compilation error if there is any `unsafe` code in the current crate), and then advertising this to their users. But there's no enforcement, and the library can still add `#[allow(unsafe_Code)]` to specific functions.
Perhaps a capability system could work like the current "feature" flags, but for the standard library, which would mean they could be computed transitively.
FYI: `#[forbid(_)]` cannot be bypassed by the affected code (without a never-to-be-stabilised nightly feature meant to be used only in `std` macros).
https://doc.rust-lang.org/rustc/lints/levels.html
Ah right, forgot about forbid!
I love this idea and I hope I get to work on it someday. I've wanted this ever since I was a starry-eyed teenager on IRC listening to Darius Bacon explain his capability-based OS idea, aptly called "Vapor".
I think it could be possible in Rust with a linter, something like https://github.com/geiger-rs/cargo-geiger . The Rust compiler has some unsoundness issues such as https://github.com/rust-lang/rust/issues/84366 . Those would need fixing or linter coverage.
I've thought about this (albeit not for that long) and it seems like you'd need a non-trivial revamp of how we communicate with the operating system. For instance, allowing a library to "read from a stream" sounds safe until you realize they might be using the same syscalls as reading from a file!
That's one hell of a task. First question is how fine-grained your capability system will be. Both in terms of capabilities and who they are granted for. Not fine-grained enough and everything will need everything, e.g. access to various clocks could be used to DoS you or as a side channel attack. Unsafe memory access might speed up your image parsing but kills all safety. Similar problems with scope. If per dependency, forces library authors to remove useful functionality or break up their library into tiny pieces. If per function and module you'll have a hard time auditing it all. Lastly, it's a huge burden on devs to accurately communicate why their library/function needs a specific capability. We know from JavaScript engines, containerization and WASM runtimes what's actually required for running untrusted code. The overhead is just to large to do it for each function call.
I don't think you need to get very complex to design a language that protects libraries from having implicit system access. If the only place that can import system APIs is in the entry program, then by design libraries need to use dependency injection to facilitate explicit passing of capabilities.
One can take just about any existing language and add this constraint, the problem however is it would break the existing ecosystem of libraries.
If you want this today, Haskell might be the only choice.
Yes, there is a sense in which Haskell's "effect systems" are "capability systems". My effect system, Bluefin, models capabilities as values that you explicitly pass around. You can't do I/O unless you have the "IOE" capability, for example.
https://hackage.haskell.org/package/bluefin
Is there anything in existence which has a version of this idea? It makes a ton of sense to me, but you are right that it would be practically impossible to do in a current language.
Capslock for go
https://github.com/google/capslock
Yes, but you can't enforce this at the language level if your objective is security (at least not for natively-compiled languages). You need OS-level support for capabilities, which some OSes do provide (SeL4, Fuchsia). But if you're in a VM rather than native code then you can enforce capabilities, which is what Wasm does with WASI.
Wasm + wasi let you define hard boundaries between components with explicit interfaces, might be loosely along these lines?
Austral, for example? https://austral-lang.org/spec/spec.html#rationale-cap
Austral is a really cool experiment and I love how much effort was put into the spec which you've linked to. It explains the need for capabilities and linear types, and how they interact, really well.
.NET Framework, windows only, (non .NET, aka .NET Core)
What if I pass in components from one library with permissions to another library that doesn't have those permissions?
Doesn't Haskell do this to some degree with the IO monad? Functions that are not supposed to do IO directly simply have a more specific type signature, like taking in a stream and returning a buffer for example.
Yes, although it can be violated by unsafePerformIO and friends. Haskell's is not an "assured" system.
Capslock sort of does this with go https://github.com/google/capslock
Interesting. I hadn't seen it yet. I'll check out how fine-grained it really is. My first concern would (naturally) be network calls, but calling a local service should ideally is distinguishable from calling some address that does not originate in the top level.
If anyone ever check this thread: it works well. Use the json output, and it'll show the call path for each "capability" it detects (network, arbitrary code execution, ...). I use this on the output to organize it into a spreadsheet and scan quickly:
> I think it would need to be a new language [..]
Languages (plural) ... no single language will work for everyone.
TypeScript ecosystem supports this! An environment without e.g. file operations will simply miss classes that are needed for it, and your compilation will fail.
This is just a modern problem in all software development, regardless of language. We are doing more complex things, we have a much bigger library of existing code to draw from and there are many reasons to use it. Ultimately a dependency is untrusted code, and there's a long road to go in hardening entire systems to make running arbitrary dependencies safe (if its even possible).
In the absence of a technical solution, all others basically involve someone else having to audit and constantly maintain all that code and social/legal systems of trust. If it was pulled into Rust stdlib, that team would be stuck handling it, and making changes to any of that code becomes more difficult.
I'd argue that the severity varies between languages, despite the core problem being universal. Languages with comprehensive standard libraries have an advantage over those with minimal built-in functionality, where people rely on external dependencies even for the most basic things (e.g. see Java/.NET vs JS/Node). Lightweight is not always better.
> Languages with comprehensive standard libraries have an advantage
I don't see the advantage. Just a different axis of disadvantage. Take python for example. It has a crazy big standard library full of stuff I will never use. Some people want C++ to go in that direction too -- even though developers are fully capable of rolling their own. Similar problem with kitchen-sink libraries like Qt. "batteries included" languages lead to higher maintenance burden for the core team, and hence various costs that all users pay: dollars, slow evolution, design overhead, use of lowest common denominator non-specialised implementations, loss of core mission focus, etc.
It's a tradeoff. Those languages also have a very difficult time evolving anything in that standard library because the entire ecosystem relies on it and expects non-breaking changes. I think Rust gets sort of best of both worlds because dependencies are so easy to install it's almost as good as native, but there's a diversity of options and design choices, easy evolution and winners naturally emerge - these become as high quality as a stdlib component because they attract people/money to work on them but with more flexibility to change or be replaced
> If it was pulled into Rust stdlib, that team would be stuck handling it, and making changes to any of that code becomes more difficult.
I think Rust really needs to do more of this. I work with both Go and Rust daily at work, Go has its library game down -- the standard library is fantastic. With Rust it's really painful to find the right library and keep up for a lot of simple things (web, tls, x509, base64 encoding, heck even generating random numbers.)
I disagree, as I see it Rust's core-lib should be to interact with abstract features (intrinsics, registers, memory, borrow-checker, etc), and std-lib should be to interact with OS features (net, io, threads). Anything else is what Rust excels at implementing, and putting them into stdlib would restrict the adoption of different implementations.
For example there are currently 3, QUIC (HTTP/3) implementations for rust: Quiche (Cloudflare), Quinn, S2N-QUIC (AWS). They are all spec compliant, but may use different SSL & I/O backends and support different options. 2 of them support C/C++ bindings. 2 are async, 1 is sync.
Having QUIC integrated into the stdlib wouuld means that all these choices would be made beforehand and be stuck in place permanently, and likely no bindings for other languages would be possible.
Gilad Bracha has a really interesting approach to sandboxing third party libraries: Remove imports, and do everything with dependency injection. That way if you never inject say the IO subsystem, the third party code won't be able to break out. And there's no overhead, since it's all based on capabilities.
Even cooler, if you want to only expose read operations, you can wrap the IO library in another library that only exposes certain commands (or custom filtering, etc).
EDIT: I should say this doesn't work with systems programming, since there's always unsafe or UB code.
That sounds neat, is that newspeak?
Yep! One of the many cool concepts packed in that language :)
Maybe we should have a way to run every single library we use in an isolated environment and have a structure like QubesOS. Your main code is dom0 and you can create bunch of TemplateVMs which are your libraries and then create AppVMs for using those libraries. Use network namespaces for communicating between these processes. For sensitive workloads (finance, healthcare, etc), it makes sense to deploy something like that
Regardless of language, really? I highly doubt that, you don't generally see such problems with C or even C++ because dependencies are more cumbersome to add, especially in a way that's cross-platform.
With C++ it's hilarious because the C++ community is so allergic to proper dependency management and also so desperate for stuff from third party libraries that the committee spends large amounts of its time basically doing dependency management for the community by baking in large features you'd ordinarily take as a dependency into the mandatory standard library.
I'm sure I'll miss some, but IIRC C++ 26 is getting the entire BLAS, two distinct delayed reclamation systems and all of the accompanying infrastructure, new container types, and a very complicated universal system of units.
All of these things are cool, but it's doubtful whether any of them could make sense in a standard library, however for C++ programers that's the easiest way to use them...
It's bedlam in there and of course the same C++ programmers who claim to be "worried" that maybe somebody hid something awful in Rust's crates.io are magically unconcerned that copy-pasting tens of millions of lines of untested code from a third party into absolutely every C++ program to be written in the future could be a bad idea.
> copy-pasting tens of millions of lines of untested code from a third party into absolutely every C++ program to be written in the future could be a bad idea.
Is it really that bad? (By my count, as a point of reference, the Python 3.13 standard library is just under 900k lines for the .py files.)
If something is in the standard library, then it’s written and vetted by the standard library provider, not by a random third party like you make it sound.
With Rust, it’s literally a random third party.
Maintainers of all open source standard libraries are effectively "random third parties". With heavily used ecosystem dependencies (such as Tokio, but also swaths of small libraries, such as `futures` or `regex`), the number of people who have looked at the code and battle-tested it is also huge.
On crates.io, a good heuristic is to look at two numbers: the number of dependents and the number of downloads. If both are high, it's _probably_ fine. Otherwise, I'll manually audit the code.
That's not a complete solution, especially not if you're worried about this from a security perspective, but it's a good approximation if you're worried about the general quality of your dependencies.
what other “quality” is there to worry about besides security?
Stability, correctness, test coverage, performance.
You can lump anything under "security" for particular use cases, but what's the point of words then.
People are paid to work on standard libraries and there’s a whole process behind developing and releasing this software.
Tokio on the other hand is the library whose maintainer decided to download a binary blob during build: https://github.com/tokio-rs/prost/issues/562 https://github.com/tokio-rs/prost/issues/575
Good luck catching such issues across dozens of crates.
The issue you linked is a perfect example in support of my argument. Lots of people noticed the problem, and it was quickly rectified.
> it’s written and vetted by the standard library provider, not by a random third party
All three modern C++ standard libraries are of course Free Software. They are respectively the GNU libstdc++, Clang's libc++ and the Microsoft STL. Because it's a huge sprawling library, you quickly leave the expertise of the paid maintainers and you're into code that some volunteer wrote for them and says it's good. Sounds like random third parties to me.
Now, I'm sure that Stephan T. Lavavej (the Microsoft employee who looks after the STL, yes, nominative determinism) is a smart and attentive maintainer, and so if you provide a contribution with a function named "_Upload_admin_creds_to_drop_box" he's not going to apply that but equally Stephen isn't inhumanly good, so subtle tricks might well get past him. Similar thoughts apply to the GNU and Clang maintainers who don't have funny names.
One Stephan T. Lavavej is worth more than 1000 random github rustaceans, some of which will be bots, AIs, rank amateurs, bought or North Korean spies. Any of the libraries has one or more Stephans.
Having paid maintainers, code review, test suites, strict contribution guidelines, etc is state of the art for open source software that some transitive crate dependency can only dream to achieve.
> With Rust, it’s literally a random third party.
No, tons of the foundational Rust crates that show up in every dependency tree are first-party crates provided by the Rust project itself.
Because most dependencies are either manually installed by the user, or are dynamic libraries that are provided and audited by the distro maintainers. The dependencies are there, they're just harder to see - https://wiki.alopex.li/LetsBeRealAboutDependencies
Sure, there are various dependencies, but it's nothing like "cargo install crate-name". Cargo makes it so effortless to joink the dumbest dependency for the simplest thing.
On the other hand, C/C++ makes it attractive to reinvent the wheel, or vendor the dependency instead. Rather than a single well-tested implementation in the ecosystem for something like sha256, you end up with every application having its own slightly-different, mostly untested, and essentially unmaintained version.
Applications still need the functionality. The need doesn't magically disappear when installing dependencies is a pain. If a crate has a bug, the entire ecosystem can trivially get the fixed version. If the Stackoverflow snippet a C app is vendoring has a bug, that fix is never getting in the app.
That does not help you if the bug is one of many unmaintained crates and never noticed. Linux distributions aim to make sure that C application dynamically link to the right libraries instead of vendoring the code. Then the library can be updated once. IMHO this is the only reasonable approach.
It's trivial to see on crates.io whether a crate is unmaintained.
Maybe if it is completely unmaintained, but this is not enough to solve the problem and maybe also not really the point.
is it trivial to see if a third level dependency is unmaintained?
> Sure, there are various dependencies, but it's nothing like "cargo install crate-name".
You don't install a Rust crate to use it. We have enough people in this thread trying to authoritatively talk about Rust without having any experience with it, please don't bother leaving a comment if you're just going to argue from ignorance.
Sure, I think software that's easy to use is a good thing and Rust dependency management is 100x nicer to work with than C++.
so what's the option??? I don't think this is only cargo-rust problem since you can do it too in another language
Kind of true, when not using vcpkg/conan.
Don’t forget cmake. (It makes adding dependencies easy, and everything else basically impossible)
Sure, despite all the hate it gets, except for IDE project files, it is the best experience in C and C++ build tools since forever, including IDE integration just like those project files.
I thought the whole UNIX mentality was worse is better.
No build tool is without issues, my pain points with cargo, are always compiling from source, build caching requires additional work to setup, as soon as it is more than pure Rust, we get a build.rs file that can get quite creative.
And from Internals discussion (https://internals.rust-lang.org/t/add-some-form-of-precompil...) it seems this causes more problems than it solves.
It requires huge storage, for each combination of targets, and even if it is was solved some members of Rust community would see it as a step back.
Me included. They are hard to audit and are step back to OSS nature of Rust.
A systems programming language is supposed to support all deployment scenarios, not to be religious.
Wdym? The language supports it, how else could have serde done it?
The issue here is getting storage and compute for build artifacs for cargo. Cargo isn't the language though.
I also don't understand the CMake hate. Modern CMake (3.14+) is just around 10 lines to build basic sources/libraries/executables. And you can either use CMake FetchContent or use CPM https://github.com/cpm-cmake/CPM.cmake to fetch dependencies. No third-party tool like vcpkg or conan is needed.
I think CMake is the perfect balance. You need to write few lines and think about few things before adding a dependency, but usually nothing too crazy. It might not work the first try but that's okay.
> We are doing more complex things
In my experience we have more complex methodologies to the same things, but the goals are not more complex.
Yes, but a lot of the complexity is unnecessary bloat. Almost every project I've ever seen or worked on was full of unnecessary complexity. People naturally tend to over-complicate things, all the programming books, including software design books focus on unimportant aspects and miss all the important ones. It's incredibly frustrating.
Yet, if someone were to write a book which explained things properly (probably a 3000 word article would suffice to turn anyone into a 10x dev), nobody would buy it. This industry is cooked.
Do you mean this article?: https://grugbrain.dev/
No. Me write better article.
To quote one famous developer: "Talk is cheap. Show me code!"
I think that https://blessed.rs does a pretty good job of providing recommendations for things that probably can't be crammed into the standard library, but which you'll almost certainly end up needing at one point or another. I honestly like that system a lot, it makes it so that the only packages you need to worry much about are usually doing something rather specific.
Also shout out to cargo-vet.
It lets you track what packages you "trust". Then you can choose to transitively trust the packages trusted by entities you trust.
This lets you have a policy like "importing a new 3rd party package requires a signoff from our dependency tzar. But, packages that Google claim to have carefully reviewed are fine".
You can also export varying definitions of "trust". E.g. Google exports statements like:
- "this package has unsafe code, one of our unsafe experts audited it and thinks it looks OK"
- "this package doesn't do any crypto"
- "this is a crypto library, one of our crypto experts audited it and thinks it looks ok"
https://github.com/google/rust-crate-audits/blob/main/auditi...
Basically it's a slightly more formal and detailed version of blessed.rs where you can easily identify all the "it's not stdlib, but, it's kinda stdlib" stuff and make it easily available to your team without going full YOLO mode.
It can also give you a "semi-YOLO" approach, it supports rationales like "this package is owned by a tokio maintainer, those folks know what they're doing, it's probably fine". I think this is a nice balance for personal projects.
Would love to see something like this for Python.
Did a review; this is solid!
Similar feeling here.
Cargo makes it so simple to add tons of dependencies that it is really hard not to do it. But that does not stop here: even if I try to be careful with adding dependencies, a couple dependencies are likely to pull tens of transitive dependencies each.
"Then don't depend on them", you say. Sure, but that means I won't write my project, because I won't write those things from scratch. I could probably audit the dependency (if it wasn't pulling 50 packages itself), but I can't reasonably write it myself.
It is different with C++: I can often find dependencies that don't pull tens of transitive dependencies in C++. Maybe because it's harder to add dependencies, maybe because the ecosystem is more mature, I don't know.
But it feels like the philosophy in Rust is to pull many small packages, so it doesn't seem like it will change. And that's a pity, because I like Rust-the-language better than C++-the-language. It just feels like I trade "it's not memory-safe" for "you have to pull tons of random code from the Internet".
This was linked from the top comment on the Rust subreddit: https://wiki.alopex.li/LetsBeRealAboutDependencies
I think it makes a good point that some of the difference here is just perception due to dependencies in C/C++ being less immediately visible since they're dynamically loaded. To some degree that is a plus though as you likely trust the maintainers of your OS distribution to provide stable, supported libraries.
As other commenters have said, perhaps this is an area where the Rust maintainers could provide some kind of extended standard library where they don't guarantee backwards compatibility forever, but do provide guarantees about ongoing fixes for security issues.
> This was linked from the top comment on the Rust subreddit: https://wiki.alopex.li/LetsBeRealAboutDependencies
It was also posted here, shortly before this thread: https://news.ycombinator.com/item?id=43934343
(And several times in the past, too.)
> I think it makes a good point that some of the difference here is just perception due to dependencies in C/C++ being less immediately visible since they're dynamically loaded.
The point wasn't so much about the loading mechanism, but about the fact that the system (especially on Linux) provides them for you; a good amount come pre-installed, and the rest go through a system package manager so you don't have to worry about the language failing to have a good package system.
> some of the difference here is just perception due to dependencies in C/C++ being less immediately visible since they're dynamically loaded.
Not in my case. I manually compile all the dependencies (either because I need to cross-compile, or because I may need to patch them, etc). So I clearly see all the transitive dependencies I need in C++. And I need a lot less than in Rust, by a long shot.
Part of the rust dependency issue is that the compiler only multithreads at the crate level currently (slowly being improved on nightly, but there's still some bugs before they can roll out the parallel compiler), so most libraries split themselves up into a ton of small crates because otherwise they just take too long to compile.
edit: Also, `cargo-vet` is useful for distributed auditing of crates. There's also `cargo-crev`, but afaik it doesn't have buy in from the megacorps like cargo-vet and last I checked didn't have as many/as consistent reviews.
https://github.com/mozilla/cargo-vet
https://github.com/crev-dev/cargo-crev
Can't believe I'd never heard of cargo vet before, that sounds really promising!
> so most libraries split themselves up into a ton of small crates because otherwise they just take too long to compile.
In practice, does this make it feasible to pick and choose the pieces you actually need?
It can do. Additionally, because each part is now smaller it's now easier to ensure that each part, in isolation, does what it says on the tin. It also means that other projects can reuse the parts. An example of the last point would be the Regex crate.
Regex is split into subcrates, one of which is regex-syntax: the parser. But that crate is also a dependency of over 150 other crates, including lalrpop, proptest, treesitter, and polars. So other projects have benefited from Regex being split up.
Yes, when done properly. Rust has "feature flags" that can selectively enable dependencies, and effectively act as `#ifdef` guards in the code.
I take bit less unstable dependencies over the total mess of C++ dependencies with CMake, shared libraries, version conflicts etc any time. There's probably also a bit of an illusion about C++ transitive dependencies due to them usually being precompiled (because compiling them is such pain).
The whole pkgconfig, cmake, autotools etc ecosystem is insane compared to how Rust and Go do things.
It's part of the reason why software distribution on Linux has been pushed to using containers, removing the point of having shared libraries. I think Google with it's C++ replacement (Carbon) plans on doing it's own system.
From my point of view, the issue stems from developers wanting to control distribution. Fine if it's for your own usage, not really if you're planning for others to use it. You will find the most convoluted build system just because they have a pet platform they want to specially support making it hell to do anything on others.
It could be better, but the current solutions (npm, go, python,...) favor only the developers, not the maintainers and packagers.
There's examples of maintainers/packagers effectively sabotaging other peoples projects when making packages for distros, whether that's shipping them broken, ancient versions etc.
e.g. Bottles, WebkitGTK (distros liked keeping this one held back even though doing so is a security risk)
IMHO it shouldn't be the responsibility of the OS vendor to package third party applications.
Distro maintainers/packagers are who keep the current software stacks running. It's rather amazing how they manage to keep the billion or so lines of separately written code working in unison.
That said, the labor needed to keep the stuff together could be reduced a lot by the more ergonomical and universal packaging and distribution methods like Cargo (and, dare I say, npm). I think some kind of a better bridge between developers and distros could be found here.
> > I think some kind of a better bridge between developers and distros could be found here.
Every tom dick and harry is making their own distro these days (even if they're just respins of Arch with Calamares and some questionable theme settings), why add more work onto developers?
We have things like Flatpak and Docker now that let application developers ignore the distros and stop them breaking things, unless you're Ubuntu whom is constantly begging to get purchased by Microsoft.
> I think some kind of a better bridge between developers and distros could be found here.
I don’t think there’s a need to do so. Only discipline is needed, by using stable and mature dependencies, and documenting the building process. And maybe some guides/scripts for the most popular distros.
> It's part of the reason why software distribution on Linux has been pushed to using containers
My understanding of people distributing their software in containers is that they can't be arsed to learn how to do it properly. They would install their software and ship the entire computer if that was cost effective.
What needs to be "learned properly" is sadly a huge pile of incoherent legacy cruft that ideally wouldn't be there at all.
This is not to denigrate the huge and critical effort that makes current computing possible, and that is likely unavoidable in the real world. But software distribution needs to evolve.
> What needs to be "learned properly" is sadly a huge pile of incoherent legacy cruft
I don't find it incoherent, nor huge. Unless the bar for "huge" is "anything that requires more attention than asking an LLM and copy-pasting its answer", maybe.
It's not a case of 'learning to do it properly', it's a case of a huge amount of effort to deal with arbitrary differences between distros, as well as fighting with distro policies that would rather ship the software with known bugs than allow two versions of a library to exist on the system.
> it's a case of a huge amount of effort to deal with arbitrary differences between distros
That is not at all a problem for open source stuff: build your project correctly, and let distros do their job. Still, open source projects are too often doing it wrong, because nobody can be arsed to learn.
> as well as fighting with distro policies that would rather ship the software with known bugs than allow two versions of a library to exist on the system.
Sounds like if you need this, you're doing it wrong. If it's a major update (e.g. 2.3.1 to 3.0.0), it's totally possible to have a new package (say `python2` and `python3`). If your users need two versions of a library that are in the same major version (e.g. 2.3.1 and 2.5.4), then you as a developer are doing it wrong. No need to fight, just learn to do it properly.
> the philosophy in Rust is to pull many small package
I'm not sure it's a philosophy, more a pragmatic consideration for compilation speeds. Anyone who's done a non-trivial amount of Rust knows that moment when the project gets too big and needs to split into separate crates. It's kinda sad that you can't organize code according to proper abstractions, many times I feel forced to refactor for compiler performance.
> Sure, but that means I won't write my project, because I won't write those things from scratch.
You need to think a bit harder about that, to help you decide whether your position is rational.
This confuses me as well. Is the implied solution to choose a language where you are forced to write those things from scratch?
My point is that if, in the language, everybody is incentivise to use fewer dependencies, then a random library that I would not write myself (because it is an entire project in itself) would have fewer dependencies. Because it is not the case, either I take that library and accept its transitive dependencies, or I don't have a library at all.
In Rust, I'm sometimes actually tempted to wrap a C/C++ library (and its few dependencies) instead of getting the Rust alternative (and its gazillion dependencies).
And you need to think a bit about that (probably not very hard), to help you decide whether I'm irrational or whether you may not have totally understood my point.
I have been wasting 6 hours yesterday on getting the bullet examples to compile outside of bullet itself with no success. It's more likely that a lot of software simply doesn't get written because C++ and CMake are a pain in the ass.
I find CMake pretty easy, and I only use a few core features from it. Usually the pain comes from completely wrong setups by people who didn't learn the basic. But it's true of everything, I think.
I feel like leftpad has given package managers a very bad name. I understand the OP's hesitation, but it feels a little ridiculous to me.
tokio is a work-stealing, asynchronous runtime. This is a feature that would be an entire language. Does OP consider it reasonable to audit the entire Go language? or the V8 engine for Node? v8 is ~10x more lines than tokio.
If Cloudflare uses Node, would you expect Cloudflare to audit v8 quarterly?
And for what it's worth, people do audit tokio. I have audited tokio. Many times in fact. Sure, not everyone will, but someone will :)
How does one approach doing so? Do you open the main.rs file (or whichever is the entry point) and start reading code and referenced functions on a breadth-first search (BFS) manner?
If two different dependencies use a different version of some other dependency between them does cargo still include both versions by default?
This is something I've only ever seen cargo do.
It'll do that if there isn't a single version that meets both requirements. Which is a great thing, because most other languages will just fail the build in that case (well, there are still cases where it won't even work in rust, if types from those sub-dependencies are passed in between the two closer dependencies)
> If two different dependencies use a different version of some other dependency between them does cargo still include both versions by default?
No, cargo will resolve using sem ver compatibility and pick the best version. Nuget, for C# does something very similar.
> This is something I've only ever seen cargo do.
npm does this (which causes [caused?] the node_modules directory to have a megazillion of files usually, but sometimes "hoisting" common dependencies helps, and there's Yarn's PnP [which hooks into Node's require() and keeps packages as ZIPs], and pnpm uses symlinks/hardlinks)
In the past (not in Rust, but other languages), for important systems, I've instituted policies of minimizing dependencies from these language-specific package repositories, and for the ones you do use, having to copy it to our own repos and audit each update before use.
But that's not practical for all situations. For example, Web frontend developer culture might be the worst environment, to the point you often can't get many things done in feasible time, if you don't adopt the same reckless practices.
I'm also seeing it now with the cargo-culting of opaque self-hosted AI tools and models. For learning and experimenting, I'd spend more time sufficiently compartmentalizing an individual tool than with using it.
This weekend, I'm dusting off my Rust skills, for a small open source employability project (so I can't invest in expensive dependency management on this one). The main thing thing bothering me isn't allocation management, but the sinking feeling when I watch the cast-of-thousands explosion of transitive dependencies for the UI and async libraries that I want to use. It's only a matter of time before one of those is compromised, if not already, and one is all it takes.
Best way is to have CI/CD systems only connected to the official internal repos.
Devs can add whatever they feel like on their workstations but it will be a sad build server if they get pushed without permission.
s/Best way/The only safe way/
Anything else will get abused in the name of expediency and just-this-one-time.
Also, the process for adding a crate/gem/module/library needs to be the same as anything else: license review, code review, subscription to the appropriate mailing list or other announce channel, and assignment of responsibility. All of these except code review can be really, really fast once you have the process going.
All problems are, at least in part, dependency chain management problems.
I agree that some amount of friction when including third party dependencies is a vital thing to push people to consider the value versus cost of dependencies (and license review, code review, channel subscriptions are all incredibily important and almost always overlooked), however how should this work for transitive dependendencies? And the dependencies of _those_ dependencies?
The dependency trees for most interpreted or source-distributed languages are ridiculous, and review of even a few of those seems practically impossible in a lot of development environments.
You understand the problem clearly, but you haven't put your finger on the solution.
It's an obvious one, but distasteful to many people.
Perhaps the distaste is blinding me.
Would you care to state the obvious very clearly, for the dense ones among us?
> Devs can add whatever they feel like on their workstations
A compromised dev machine is also a problem.
True, hence we can go next level and also deal with limited accounts for developers, and I can tell you most folks on HN would hate to work in such corporate environments.
I'd leave. If I have to beg IT security every other day for something,it's just not worth it. I was in that situation once before and it was endlessly frustrating. It also wasn't even their choice, the CEO dictated it after attending some security talk once upon a time, and then instantly "you can't trust anyone or anything". You can trust my stay there will be short though :)
No doubt, although this is always a job market situation, in many places around the globe being a developer isn't much different from any other office job, where many folks have to be happy to have a job in first place.
There are some voices trying to address this security risk (e.g. the proponents of this new RFC: https://github.com/rust-lang/rfcs/pull/3810). However, for some reason (probably culture) there isn't much momentum yet to change the status quo.
> isn't much momentum yet to change the status quo.
it's complex problem with tons of partial solutions which each have tons of ways to implement them with often their no being a clear winner
i.e. it's the kind of hard to solve by consensus problem
e.g. the idea of a extended standard library is old (around since the beginning of rust) but for years it was believed it's probably the best to make it a separate independent project/library for various reason. One being that the saying "the standard library is the place where code goes to die" has been quite true for multiple ecosystems (most noticeably python)
as a side note ESL wouldn't reduce the LOC count it would increase it as long as you fully measure LOCs and not "skip" over some dependencies
The rust RFC process has, frankly, become somewhat of a CF.
There's literally 1000s of RFCs for rust with only a small handful that are integrated. Having this forest, IMO, makes it hard for any given proposal to really stand out. Further, it makes duplicate effort almost inevitable.
Rust's RFC process is effectively a dead letter box for most.
I think they can constitute committee for RFC review process(in case there is none today) and based on recommendation multiple domain specific teams/ groups can be created to review RFCs in timely manner.
The cool thing about rust is you can implement async yourself. You aren't tied to any specific implementation.
Except that libraries using different async libraries in Rust seem generally incompatible.
Same in C++, partially true in .NET/C# and F#.
Or not use async at all.
We need a term like “Mature” or similar for dependencies that are done. Mature dependencies have two characteristics:
1. Well defined scope
2. Infrequent changes
Nomad has many of these (msgpack, envparse, cli, etc). These dependencies go years without changing so the dependency management burden rapidly approaches zero. This is an especially useful property for “leaf” dependencies with no dependencies of their own.
I wish libraries could advertise their intent to be Mature. I’d choose a Mature protobuf library over one that constantly tweaked its ergonomics and performance. Continual iterative improvement is often a boon, but sometimes it’s not worth the cost.
Java did this sometimes by essentially adding slightly tidied up versions of whatever was the de-facto standard to the standard library. Java 1.3 didn't have regexes but most people were using the same apache commons thing, so java 1.4 added regexes that looked exactly like that. Java's date handling was a pain so people mostly used joda-date; a later java version added something that mostly works like jodadate. Etc.
It is an easy way to get a somewhat OK standard library as the things you add became popular on their own merits at some point.
Once added, the lowest friction path is to just use the standard library; and as it is the standard library you have a slightly better hope someone will care to maintain it. You can still build a better one if needed for your use-case, but the batteries are included for basic usage
I have a lot of sympathy for this viewpoint, but I also ask that we try to remind ourselves. We are asking for professionalism from hobby projects.
If you want a mature protobuf implementation you should probably buy one. Expecting some guy/gal on the internet to maintain one for your for free seems ill advised.
> I have a lot of sympathy for this viewpoint, but I also ask that we try to remind ourselves. We are asking for professionalism from hobby projects.
Nobody is asking for professional quality standards from hobby projects. At best, they are asking for hobby projects to advertise themselves as such, and not as "this is a library for [x] that you can use in your stuff with the expectations of [maintenance/performance/compatibility/etc.]."
Resume-driven development seems to cause people to oversell their hobby projects as software that is ready to have external users.
> If you want a mature protobuf implementation you should probably buy one
No software is ever developed this way. For some reason, libraries are always free. Approximately nobody will buy paid libraries.
> For some reason, libraries are always free. Approximately nobody will buy paid libraries.
I suspect this is in no small part because figuring out a licensing (edit: pricing!) model that is both appealing to consumers and sustainable for authors is damn near impossible.
> At best, they are asking for hobby projects to advertise themselves as such
That's also work. You don't get to ask the hobby programmer to do your work of vetting serious/maintained projects for you. As the professional with a job, you have to do that. If some rando on GitHub writes in their readme that it's maintained, but lies. You're the idiot for believing him. He's probably 12 years old, and you're supposedly a professional.
> No software is ever developed this way.
That's just inaccurate. In my day job we pay for at least 3-4 3rd party libraries that we either have support contracts on or that were developed for us along with a support contract. Besides those there's also the myriad of software products, databases, editors, Prometheus, grafana, that we pay for.
Software people really underestimate how much business guys are willing to pay for having somebody to call. It's not "infinitely scalable" in the way VC's love, but it's definitely a huge business opportunity.
To add to this, in the gamedev space there are a bunch of middleware libraries that are commonly paid for: fmod/wwise, multiplayer networking sdks, etc.
Thinking about this a bit more, it seems that the reason there isn't a good way to sell licenses to software libraries generally is license enforcement. Unity and Unreal have a licensing system built-in that they enforce against gamedevs. Normal server software has no such thing.
That means the only threat you have as a producer of code (once the code is handed over) is the threat of withdrawing service. That means the only ways to sell licenses are:
* Build your own licensing service (or offer SaaS)
* Sell the code for a high price upfront
* Sell service contracts
Isn't that an argument _for_ having a "mature" label? To avoid the hobbyists who have no intention to maintain their thing?
Also there are lots of lovely projects maintained at high levels by hobbyists, and plenty of abandonware that was at some point paid for
> Also there are lots of lovely projects maintained at high levels by hobbyists, and plenty of abandonware that was at some point paid for
There certainly are. I would never say to disregard anything because it was a hobby project. You just don't get to expect it being that way.
My basic point is that a hobby project can never take responsibility. If you have a support contract you are allowed to have some expectation of support. If you do not, then no expectation is warranted and everything you get is a gift.
A "mature" label carries the same problem. You are expecting the author to label something for you. That's work. If you're pulling from the commons, you must respect that people can label stuff whatever they like, and unmotivated blanket lies are not illegal.
yeah, that's a good point
A great point! All of the libraries I mentioned are created and maintained by corporations. Hobbyists, as always, are free to do as they please without judgement from me. :)
I will say I get great satisfaction from the little envparse library I wrote needing near-0 maintenance. It’s a rare treat to be able to consider any project truly done.
I feel like the Go ecosystem almost serendipitously has this built in - modules marked v0.X.Y being immature and under development, and v1 or greater being mature, keeping changes mostly down to bug fixes. I think some folks may even follow this convention!
One of the good thing in cargo packages are the feature flags. If a repo uses too much dependencies then it's time to open an issue or PR to hide them behind feature flags. I do that a lot with packages that requires std even though it could do with core and alloc.
cargo tree helps a lot on viewing dependency tree. I forgot if it does LoC count or not..
> to see what lines ACTUALLY get compiled into the final binary,
This doesn't really make much sense as a lot of the functions that make it to the binary get inlined so much that it often becomes part of 'main' function
100%, I still miss feature flags in npm. Is there a package manager that can do this already? I'd love to expand our internal libs with framework-specific code
I once wanted to contribute to the popular swc project (https://github.com/swc-project/swc). I cloned the repo, ran build, and a whooping 20GB was gone from my disk. The parser itself (https://github.com/swc-project/swc/blob/main/crates/swc_ecma...) has over a dozen dependencies, including serde.
Meanwhile, the heaviest JavaScript parser implemented in JavaScript is more lightweight.
I decided that I should leave this project alone and spend my time elsewhere.
I just built it with "git clone --depth 1 ..." and the build from cargo build --release is 2.9GB (2.3GB in the target folder)?
Rust generates absurd amounts of debug info, so the default debug builds are much much larger.
Zero-cost abstractions don't have zero-cost debug info. In fact, all of the optimized-away stuff is intentionally preserved with full fidelity in the debug info.
You should also add the dev build
I agree that relying on unknown dependencies is a risk, but this misses the point IMO. Number of dependencies and disk space are kind of arbitrary.
> Meanwhile, the heaviest JavaScript parser implemented in JavaScript is more lightweight.
The lightest weight javascript program relies on V8 to run, which has multiple orders of magnitude more dependencies. Most of which you have never heard of.
At least cargo makes it easier to get a clearer picture of what the dependencies are for a program.
Number of dependencies isn't exactly arbitrary...
If you have one huge dep it's easier to keep track you're on the latest update, also it's much less likely you'll fat finger it and import something typosquatting.
Also if you're in enterprise you'll have less 100 page SBOM reports.
What is more likely to be vulnerable, a 100k LoC project developed by ten people, or ten 10k LoC single maintainer projects.
Keeping track of the latest version is trivial with cargo.
Unlike my sibling commment, i don’t work in SBOM, but if you consider social dynamics and what trust means, it should be pretty obvious that trusting in a group of 10 strangers is much less risky than trusting in 10 separate strangers.
consider the probabilities
I work in SCA/SBOM.
>What is more likely to be vulnerable,
At the end of the day you are at much higher risks of one of those 10 packages getting owned by some external party and suddenly the next version is pulling a bitcoin miner, or something that steals everything it can from your CI/CD, or does a take over on your customers.
And it's never 10 (well at least for JS), it's hundreds, or if you're team is insane, thousands.
No, it has very little to do with v8 or any runtime. Those parsers run on any decent and recent enough runtime, including browsers and Node.js. If you look at the actual code, they use basic APIs in the JavaScript language that you can find in almost any other language.
> relies on V8 to run, which has multiple orders of magnitude more dependencies.
Actually, this isn't true. (Or at least wasn't a while back.) I used to work with a bunch of ex-V8 folks and they really despised third-party dependencies and didn't trust any code they didn't write. They used a few third-party libs but for them most part, they tried to own everything themselves.
they are also Google
.. as in they can afford to rewrite everything
.. can afford to suffer from not invented here syndrome
.. and are under _massive_ threat of people doing supply chain attacks compared to most other projects (as they end up running on nearly any desktop computer and half the phones out there)
this just isn't viable for most projects, not just resource/time investment wise, but also reinventing/writing everything isn't exactly good to reduce bugs if you haven't to reliably access to both resources _and_ expertise. Most companies have to live with having many very average developers, and very tight resource limits.
I think the folks who wrote V8 have always been this way, even before their company got acquired by Google and they switched to writing V8.
I am counting 13 dependencies, the rest are internal ones. Are any of these superfluous or only needed for small edge cases? Serde seems exactly a case where you absolutely should use an external dependency.
Also, repository size seems an extremely irrelevant metric.
13 > 12 so over a dozen dependencies. If you look at acorn or babel/parser, they barely have any dependency.
Repository size is directly related to how long it takes to run a build, which is extremely important if I were to contribute to the project.
> Serde seems exactly a case where you absolutely should use an external dependency.
I can't see any reason a parser has a hard dependency on a serialization library.
>13 > 12 so over a dozen dependencies. If you look at acorn or babel/parser, they barely have any dependency.
Which ones are superfluous?
There are good reasons to use dependencies. If someone has solved a problem you need to solve as well it is pointless to duplicate the effort.
>Repository size is directly related to how long it takes to run a build, which is extremely important if I were to contribute to the project.
Totally false. There is zero inherent relation.
>I can't see any reason a parser has a hard dependency on a serialization library.
And because you can't see a reason there is none?
It is totally meaningless to talk about any of this if you can not point out why this is superfluous.
I don't think there is any point in debating this, because apparently you are in the camp of "dependencies are ok", with or without a good reason, when a different camp is "avoid dependencies unless you really have to". You just provided an example of why dependencies explode like this.
> And because you can't see a reason there is none?
Somehow every other JS based parser doesn't do fancy serialization, as far as I can tell. You can come up with reasons of why one might need it, but as a user of the parser, I want the footprint to be small, and that's a requirement. In fact, that's one of the reasons I never used swc parser in my serious projects.
You are just making stuff up. You still can not articulate why these dependencies are unnecessary.
That you in particular might have no use for the features they bring couldn't be more irrelevant. What other parsers are doing could also not be more irrelevant.
> You still can not articulate why these dependencies are unnecessary.
No, because I don't have to answer that question. I can simply choose not to use this project, like what I do with npm projects. There is a project that's 500kb in code with 120 dependencies, when another one is 100kb with 10 dependencies that's also well maintained? I'll choose the latter without question, as long as it satisfies my needs. I don't care why the other one has 120 dependencies or try to justify that.
Why are you complaining that a project you do not care about is using 13 dependencies, all of which, to your knowledge, are absolutely essential for the functionality?
>There is a project that's 500kb in code with 120 dependencies
And therefore some project using 13 dependencies is doing it wrong? What are you on about. Obviously there is an enormous abuse of dependencies in the JS ecosystem, who cares?
Their original complaint was about the project taking 20GB of disk space to compile.
Also they did point out that the parser depends on a serialisation library, so you're also mistaken about parent thinking the dependencies are necessary.
On another note, this pervasive kind of passive aggressive, hand-wavy, tribalistic, blind defense of certain technologies speak volumes about their audiences.
Please actually read the conversation before commenting.
To address a point near the end of the article, here is my [partial] solution that works as a baseline.
Curate a collection of libraries you use and trust. This will probably involve making a number of your own. Wheel-reinvention, if you will. If done properly, even the upfront time cost will save in the long-run. I am in the minority here, but I roll my own libs whenever possible, and the 3rd party libs I use are often ones I know, have used been for, and vetted that they have a shallow tree of their own.
Is this sustainable? I don't know. But It's the best I've come up with, in order to use what I see as the best programming language available for several domains.
There are a lot of light-weight, excellent libs I will use without hesitation, and have wide suitability. Examples:
Heavier, and periodically experience mutual-version hell, but are are very useful for GUI programs: On a darker note, the rust web ecosystem maybe permanently lost to async and messy dependencies. Embedded is going that way too, but I have more hope there, and am doing my best to have my own tooling.Rust really made some unfortunate choices with async, it pollutes everything but isn't generic enough so now you are married to the runtime, this bifurcates the whole ecosystem. It is nearly phobos/demios problem from Dlang, but instead Tokio just took over. One doesn't use Rust anymore, they use Tokio.
Rust will thrive despite the PLT coloring debate. Async frameworks often dominate through winner-takes-all dynamics. Most blog posts on async coloring are pretentious nonsense, and I've faced heavy moderation here for calling out their intellectual bankruptcy. The completely brain dead moralizing arguments from the ignorant deserve intense derision regardless of what HN's official rules are.
Real world software ecosystems evolve slowly, requiring years of debate to shift.
- From HN's most outspoken Rust critic
GP's complaint wasn't about the coloring, but about the fact that the basic async API is not enough for most tasks, so you don't only have colored functions, you're now also bound to an async runtime. The world would be much better if most async rust code was agnostic of the async runtime, despite still having the colored functions issue.
Sure, the situation is just vastly better - even now - than standardizing the wrong solution. Each widely used async runtime in Rust has different solutions for a number of problems, and the choice isn't obvious.
For example, `tokio::spawn()` returns a task handle that lets the task keep running after the handle is dropped. `smol::spawn()` cancels the task when the task handle is dropped.
General async cancellation requires infrastructure mechanisms, and there are multiple reasonable designs.
Letting things settle in the ecosystem is a great way to find the best design to eventually incorporate in the standard library, but it takes time.
I think it's a "cultural" thing. With Go you often find developers/projects proudly mentioning that any or just a few non-std dependencies are used. Coming from Go it really feels strange when you see pages of dependencies scrolling over your screen when you build a Rust project.
Go has a fatter standard library and a "fat" runtime with built-in green threads (an asynchronous runtime basically) and garbage collection, so you get more out of the box and thus end up using fewer dependencies.
I have yet to come across a go project that doesn't pull in tons of 3rd party code as well. It seems like maybe you're over-stating the "culture" a bit.
> I have yet to come across a go project that doesn't pull in tons of 3rd party code as well.
These have Zero dependencies. It's not rare in Go land.
- https://github.com/go-chi/chi 19k stars
- https://github.com/julienschmidt/httprouter 16k stars
- https://github.com/gorilla/mux 21k stars
- https://github.com/spf13/pflag 2.6k stars
- https://github.com/google/uuid 5.6k starts
Many others have just a few dependencies.
Yeah, while I’ve seen some great libraries that follow the practice of minimizing their dependencies, I’m a bit annoyed with the amount of dependencies that docker will bring along [1]. I’ve been on the lookout for alternatives for my docker needs, but the state of podman, buildah and some others that I checked is similar. They all bring in roughly the same number of dependencies… if anyone knows of a stripped down Go lib that can be used to build from a Dockerfile, pull, and run a container, I would be grateful for any suggestions. Heck docker / moby isn’t even using go.mod proper.
[1] https://github.com/moby/moby/blob/master/vendor.mod
Wow, that's massive. I guess it's inevitable that a popular piece of open-source software for end-users will be compelled to accrue dependencies due to popular demand for features that require them.
I feel Telegraf made a good compromise: out of the box, it comes with a _ton_ of stuff[1] to monitor everything, but they make it possible to build only with pieces that you need via build tags, and even provide a tool to extract said tags from your telegraf config[2]. But lots of supply-chain security stuff assume everything in go.mod is used, so that can results in a lot of noise.
[1] https://github.com/influxdata/telegraf/blob/master/go.mod [2] https://github.com/influxdata/telegraf/tree/master/tools/cus...
Thanks! That’s an interesting approach. Haven’t seen that before. I think a better approach (in a monorepo) might be to use separate go.mod files for each module, allowing the user to configure only the needed parts separately. But I haven’t seen it used much.
> What's the solution?
Big things you use off-the-shelf libraries for. Small things you open-code, possibly by cribbing from suitably-licensed open source libraries. You bloat your code to some degree, but reduce your need to audit external code and reduce your exposure to supply chain attacks. Still, the big libraries are a problem, but you're not going to open code everything.
This isn't just Rust. It's everything.
> Big things you use off-the-shelf libraries for.
I should have added: "and for obvious reasons".
Didn't computer science hype up code reuse for decades before it finally started happening on a massive scale? For that to actually happen we needed programming languages with nice namespaces and packaging and distribution channels. C was never going to have the library ecosystem that Java, C++, and Rust have. Now that we're there suddenly we have a very worrisome supply chain issue, with major Reflections on Trusting Trust vibes. What to do? We can't all afford to open-code everything, so we won't, but I recommend that we open-code all the _small_ things, especially in big projects and big libraries. Well, or maybe the AI revolution will save us.
I wonder how much good a “dependency depth” label on packages would do, at the crates.io level. Like, a package can only depend on a package with a lower declared dependency depth than it, and packages compete to have a low dependency depth as a badge.
Everytime I daydream about my perfect language, this is one of the features I think about.
I recently wrote an extremely basic Rust web service using Axum. It had 10 direct dependencies for a total of 121 resolved dependencies. I later rewrote the service in Java using Jetty. It had 3 direct dependencies for a total of 7 resolved dependencies. Absolutely nuts.
I don't think number of dependencies is a useful comparison metric here. Java runtime already implements stuff that you have to use libraries for in Rust, and it's a design choice. Rust also has slimmer std. Both languages have different constraints for this.
> dotenv is unmaintained.
How much maintenance could you possibly need to load secrets from .env into the environment.
I agree with your general point, but for this specific functionality, I’ll point out that setting environment variables of the current process is unsafe. It took us a long time to realize it so the function wasn’t actually marked as unsafe until the Rust 2024 edition.
What this means in practice is that the call to invoke dotenv should also be marked as unsafe so that the invoker can ensure safety by placing it at the right place.
If no one is maintaining the crate, that won’t happen and someone might try to load environment variables at a bad time.
ok, I'm hooked - how is setting an env var in the current process unsafe? My gut says it's not unsafe in a memory-ownership sense, but rather in a race condition sense?
whatever the issue is, "setting an env var is unsafe" is so interesting to me that I'm now craving a blog post explaining this
It's a long standing bug, setenv and unsetenv are not thread-safe
https://www.evanjones.ca/setenv-is-not-thread-safe.html
I honestly think using setenv is just a terrible idea.
can you elaborate what is the simpelist alternative?
Simple, you don't set any env vars after starting new threads
https://doc.rust-lang.org/std/env/fn.set_var.html#safety
I find hilarious when people judge the quality of a repository by how many commits it has, as if 10.000 commits means the code is better.
The maintainers themselves give this warning in the repo's README, so even if it were maintained, it still wouldn't be production ready.
> Achtung! This is a v0.* version! Expect bugs and issues all around. Submitting pull requests and issues is highly encouraged!
https://github.com/dotenv-rs/dotenv
That is an escape hatch that is seemingly used everywhere. Nobody wants to release a 1.0 with backwards compatibility guarantees.
ZeroVer https://0ver.org/
Ironically a project that hasn't been changed in a while "unmaintained" is a good candidate for bumping to v1, while a project with new breaking commits every day is a bad candidate.
On the other hand loading .env from the environment is critical (since you are usually passing secrets through .env). I wouldn't want to maintain that myself and not share it with a xxK other projects in case there is a vulnerability.
the issue is loading and setting env (which is the default for dot env libraries)
_is fundamentally unsound thanks to unix/posix_
no way around that
hence why set env wasn't marked as unsafe _even through it not being fully save being known since extremely early rust days maybe even 1.0_
it not being unsafe wasn't a oversight but a known to not be fully sound design decision which had been revisited and changed in recently
after testing that that very small fixed functionality you provide is correct
non
small "completed" well tested libraries being flagged as security issues due to being unmaintained seem to be starting to become an issue
Rust dev culture is allergic to any code that hasn't changed significantly in the last 3 months. Too easy, too stable.
All the comments and suggestions for improving rust dependency handling seem useful to me. To deal with dependency sprawl now, until the situation changes, I use a number of tools. To avoid having to set this up for each new project, I've made a template project that I simply unzip to create new rust projects.
The tools I have found useful are:
cargo outdated # check for newer versions of deps
cargo deny check # check dependency licenses
cargo about # generate list of used licenses
cargo audit # check dependencies for known security issues
cargo geiger # check deps for unsafe rust
I haven't found a cargo tool I like for generating SBOMs, so I installed syft and run that.
cargo install-update # keep these tools updated
cargo mutants # not related to deps, but worth a mention, used when testing.
Having configured all these tools once and simply unzipping a template works well for me.
Suggestions for different or additional tools welcome!
Disclaimer: I'm not a professional rust developer.
Rust at least has a partial remedy to this problem: feature flags. Many libraries use them to gate features which would otherwise pull in extra dependencies. (In fact I believe there is specific support for flags which correspond to dependency names.)
> I can't rewrite the world, an async runtime and web server are just too difficult and take to long for me to justify writing for a project like this (although I should eventually just for a better understanding).
I did this and it only solved half of the bloat:
https://crates.io/crates/safina - Safe async runtime, 6k lines
https://crates.io/crates/servlin - Modular HTTP server library, threaded handlers and async performance, 8k lines.
I use safina+servlin and 1,000 lines of Rust to run https://www.applin.dev, on a cheap VM. It serves some static files, a simple form, receives Stripe webooks, and talks to Postgres and Postmark. It depends on some heavy crate trees: async-fs, async-net, chrono, diesel, rand (libc), serde_json, ureq, and url.
2,088,283 lines of Rust are downloaded by `cargo vendor` run in the project dir.
986,513 lines using https://github.com/coreos/cargo-vendor-filterer to try to download only Linux deps with `cargo vendor-filterer --platform=x86_64-unknown-linux-gnu`. This still downloads the `winapi` crate and other Windows crates, but they contain only 22k lines.
976,338 lines omitting development dependencies with `cargo vendor-filterer --platform=x86_64-unknown-linux-gnu --keep-dep-kinds=normal`.
754,368 lines excluding tests with `cargo vendor-filterer --platform=aarch64-apple-darwin --exclude-crate-path='*#tests' deps.filtered`.
750k lines is a lot to support a 1k-line project. I guess I could remove the heavy deps with another 200 hours of work, and might end up with some lean crates. I've been waiting for someone to write a good threaded Rust Postgres client.
I've come to accept that i wasn't really developing in "rust", but in "tokio-rust", and stopped worrying about async everywhere (it's not fundamentally different from what happens with other lang having async).
Why the need for going back to threaded development ?
> Out of curiosity I ran toeki a tool for counting lines of code, and found a staggering 3.6 million lines of rust. Removing the vendored packages reduces this to 11136 lines of rust.
Out of those 3.6 million lines, how many are lines of test code?
I'm quite careful to tightly control the dependencies of Tokio. All dependencies are under control by members of the Tokio team or others that I trust.
What actually surprised me in Rust, is the amount of fragmentation and abandoned libraries. For example, serde_yaml is archived and there are two other libraries that do the same (?) thing. It seems like there's a significant effort required to search for and decide which (if at all) library to use. This is not so much pronounced in Go.
Yeah, one problem in Rust is that a number of very fundamental ecosystem libraries are written by a handful of high-profile people. Often people who are also working on the standard library or the Rust compiler. Rust developers usually know their names and SoMe handles.
It's a problem because those people become overworked, and eventually have to abandon things. The deprecation of `serde_yaml` was and is a huge, huge problem, especially without any functional replacement. There was no call for new maintainers, or for someone to take over the project. I can understand the reasons why (now you're suddenly auditing people, not code), but it sucks.
Maybe that's the double edged sword of making the package manager so integrated into the language.
How is cargo more integrated into the language than Go’s? I’ve little to no experience with Rust, but Go’s package management seems pretty fully integrated to me.
You can audit your dependencies for crates with security vulnerabilities reported to the RustSec Advisory Database, also block unmaintained crates, and enforce your license requirements using SPDX expressions with cargo-audit and cargo-deny.
You can ensure that third-party Rust dependencies have been audited by a trusted entity with cargo-vet.
And you should have taken a look at where those 3M locs come from, it's usually from Microsoft's windows-rs crates that are transitively included in your dependencies through default features and build targets of crates built to run on windows.
The solution is strong compile time and runtime guarantees about code behavior.
The author is right there's no way an individual can audit all that code. Currently all that code can run arbitrary build code at compile time on the devs machine, it can also run arbitrary unsafe code at runtime, make system calls, etc..
Software is not getting simpler, the abundance of high quality libraries is great for Rust, but there are bound to be supply chain attacks.
AI and cooperative auditing can help, but ultimately the compiler must provide more guarantees. A future addition of Rust should come with an inescapable effect system. Work on effects in Rust has already started, I am not sure if security is a goal, but it needs to be.
The (terrible) solution that we are seeing now is generative AI. Instead of importing a library, you ask an AI to write the code for you, the AI most likely has ingested a library that implements the features you need and will essentially copy-paste that part into your code, transforming it so that it matches the rest of your code.
I believe that it causes more problems than it solves, but it can be a solution to the problem of adding thousands of lines of code of dependency when you could write a 10-line function yourself.
Of course, the proper thing to do is not to be the wrong kind of lazy and to understand what you are doing. I say the wrong kind of lazy because there is a right kind of lazy, and it is about not doing things you don't need to, as opposed to doing them poorly.
> Many call for adding more to the rust standard library much like Go
This is the way.
There should be a second stdlib with relaxed stability guarantees. Don't fill the normal stdlib full of cruft that can never be changed again.
Actually, a proposal for exactly this was published yesterday: https://github.com/rust-lang/rfcs/pull/3810
It's unfortunate that the response so far hasn't been very positive
That proposal is not exactly this; that seems to propose a "blessed crates" namespace which includes popular open-source libraries. I read this proposal as a Python-style batteries-included stdlib.
What the OP proposes is not exactly a bigger stdlib, because they mention it should have "relaxed stability guarantees". Or is python allowed to change their stdlib in backwards-incompatible ways?
It does happen - after a deprecation period, usually small changes, but frequently (i.e. in every minor version there will surely be someone directly affected). More recently there were entire swaths of modules removed - still a conservative change, because we're talking mainly about support for obscure file formats and protocols that hardly anyone has used this century (see https://peps.python.org/pep-0594/ for details - I may be exaggerating, but not by a lot).
Historically this process has been mostly informal; going forward they're trying to make sure that things get removed at a specific point after their deprecation. Python has also now adopted an annual release cadence; the combination of that with the deprecation policy effectively makes their versioning into a pseudo-calver.
Yeah, they do so regularly. Every version removes several old stdlib features (after the last version in which they were not deprecated goes EOL)
So we reinvent Java's bloated SDK again, with all of the "javax" packages. What's old is new?
There's a reason why "Java's bloated SDK" is the most popular way of writing critical software in the world right now.
Perhaps because it's a good idea.
Well, it turned out that the alternative is even worse, so... let's chalk it down to learning experience.
I'm outside of the Rust community, so my two cents are worthless - but in this thread it seems a lot of people are actually wanting a defacto app framework, not necessarily a bloated "kitchen sink" style stdlib.
The stdlib probably should remain simple, in my opinion. The complexity should be optional.
Yeah, I agree. Something like the Boost lib for C++
A strong advantage of that approach is that you don't need to be the core Rust team to do it. Anyone who wants to do this can just start doing it now.
I agree. Unfortunately, I think that a lot of the people who ask for a bigger standard library really just want (a) someone else to do the work (b) someone they can trust.
The people working on Rust are a finite (probably overextended!) set of people and you can't just add more work to their plate. "Just" making the standard library bigger is probably a non-starter.
I think it'd be great if some group of people took up the very hard work to curate a set of crates that everyone would use and provide a nice façade to them, completely outside of the Rust team umbrella. Then people can start using this Katamari crate to prove out the usefulness of it.
However, many people wouldn't use it. I wouldn't because I simply don't care and am happy adding my dependencies one-by-one with minimal feature sets. Others wouldn't because it doesn't have the mystical blessing/seal-of-approval of the Rust team.
lets put a price on it
This is only an advantage if the core Rust team is uncooperative, which is sad rather than something to be happy about.
The "Rust core team" should be working on the "Rust core", not every little thing that someone somewhere thinks should go in a standard library. It is part of the job of a "core team" to say "no".
A lot.
Like, a lot a lot a lot. Browse through any programming language that has an open issue tracker for all the closed proposals sometime. Individually, perhaps a whole bunch of good ideas. The union of them? Not so much.
This is obviously the best solution for Rust. A 'metalibrary' library type would add a lot of value to the ecosystem as a nexus:
There could be general "everything and the kitchen sink" metalibraries, metalibraries targeted at particular domains or industries, metalibraries with different standards for stability or code review, etc. It might even be valuable enough to sell support and consulting...The non-standard library, if you will.
No way. I'd much prefer we have a constellation of core companion libraries like Google's Guava.
We do not need to saddle Rust with garbage that will feel dated like Python's standard library. Cargo does the job just fine. We just need some high quality optional batteries.
Embedded projects are unlikely to need standard library bloat. No_std should be top of mind for everyone.
Something that might make additional libraries feel more first class: if cargo finally got namespaces and if the Rust project took on "@rust/" as the org name to launch officially sanctioned and maintained packages.
Python's standard library is the main reason python is usable.
Python packaging is somehow a 30 year train crash that keeps going, but the standard library is good enough that I can do most things without dependencies or with very small number of them.
I don't think an additional standard library layer, whatever you call it, has to have the same tight controls on backwards compatibility and evolution that the actual standard library has. IMO the goal of creating it should be to improve supply chain security, not to provide an extremely stable API, which might be more of a priority at lower levels but chokes off the kind of evolution that will be needed.
I think what you're suggesting is a great idea for a new standard library layer, you're just not using that label. A set of packages in a Rust namespace, maintained by the same community of folks but under policies that comply with best practices for security and some additional support to meet those best practices. The crates shouldn't be required, so no_std should work just as it would prior to such a collection.
Python's garbage works everywhere there is a full CPython implementation, I see that as an advantage.
I develop for Linux, Mac, and Windows. Multiple architectures and OSes. I rarely see platform issues with Rust. It's typically only stuff at the edge, like CUDA libraries, that trip up cross-platform builds.
Rust, as a systems language, is quite good at working on a variety of systems.
Starts already that Rust won't support architectures not available on LLVM, but on GCC, otherwise having a Rust frontend project for GCC wouldn't be a thing.
And the systems language remark, I am still looking forward when sorting ABI issues for binary libraries is finally something that doesn't need to go through solutions designed for C and C++.
What architectures that are missing from LLVM are commercially relevant today and not on well-earned retirement?
> We do not need to saddle Rust with garbage that will feel dated like Python's standard library.
Python's standard library is a strength, not a weakness. Rust should be so lucky. It's wonderful to have basic functionality which is guaranteed to be there no matter what. Many people work in environments where they can't just YOLO download packages from the Internet, so they have to make do with whatever is in the stdlib or what they can write themselves.
> Python's standard library is a strength, not a weakness. Rust should be so lucky.
Rust is luckier. It has the correct approach. You can find every battery you need in crates.io.
Python has had monstrosities like urllib, urllib2, http, etc. All pretty much ignored in favor of the external requests library and its kin. The standard library also has inconsistencies in calling conventions and naming conventions and it has to support those *FOREVER*.
The core language should be pristine. Rust is doing it right. Everything else you need is within grasp.
> The standard library also has inconsistencies in calling conventions and naming conventions and it has to support those *FOREVER*.
Not to mention abysmal designs inspired by cargo-cult "OOP" Java frameworks from the 90s and 00s. (Come on, folks. Object-oriented programming is supposed to be about objects, not about classes. If it were about classes, it would be called class-oriented programming.)
"Rust is doing it right."
Standard response every time there is some criticism of Rust.
bigstrat2003's argument is approximately "Python is batteries included"
My counter argument is that the "batteries included" approach tends to atrophy and become dead weight.
Your counter seems to be "that's not an argument, that's just Rust hype."
Am I interpreting you correctly? Because I think my argument is salient and correct. I don't want to be stuck with dated APIs from 20 years of cruft in the standard library.
The Python standard library is where modules go to die. It has two test frameworks nobody uses anymore, and how many XML libraries? Seven? (The correct answer is "four", I think. And that's four too many.) The Python standard library has so much junk inside, and it can't be safely removed or cleaned up.
A standard library should be data structure/collections, filesystem/os libraries, and maybe network libraries. That's it. Everything else changes with too much regularity to be packed in.
Your critique doesn't match the reality of Python users.
There is a single datetime library. It covers 98% of use cases. If you want the final 2% with all the bells and whistles you can download it if you wish. There is a single JSON library. It's fast enough for almost anything you want. If you want faster libraries with different usability tradeoffs you can use one but I have never felt compelled to do so.
Same thing with CSV, filesystem access, DB api, etc. They're not the best libraries at the time of any script you're writing, but the reality is that you never really need the best, most ergonomic library ever to get you through a task.
Because of this, many big complex packages like Django have hardly any external dependencies.
If anything you're not the one getting stuck with date APIs; it's the Python core devs. Maintainers of other packages are always free to choose other dependencies, but they almost invariably find that the Python stdlib is good enough for everything.
The Python datetime library is legacy software and has terrible ergonomics, terrible safety, and heinous pitfalls. It's one of my least favorite in the industry.
https://dev.arie.bovenberg.net/blog/python-datetime-pitfalls...
But now you're stuck with it forever.
Python is packed full with this shit. Because it wasn't carefully planned and respect wasn't given to decisions that would last forever.
Python has two testing frameworks baked in, neither of which is good.
Python has historically had shitty HTTP libraries and has had to roll out several versions to fix the old ones because it couldn't break or remove the old ones. Newbies to the language will find those built in and will write new software with the old baggage.
Batteries included is a software smell. It's bad. You can't change the batteries even after they expire.
> The Python datetime library is legacy software and has terrible ergonomics, terrible safety, and heinous pitfalls. It's one of my least favorite in the industry.
Your arguments seem to come from someone who doesn't have substantial software engineering experience in large systems.
All large software systems and most effective software uses libraries that are not generally super modern and not necessarily the best of the best, but they are well-understood.
In your example for datetime libraries, notice that the writer immediately ignores libraries that at some point were better than the stdlib library, but are now unmaintained. That by itself is already a red flag; it doesn't matter that a library is better if there is a large risk that it is abandoned.
Notice that no single library in the examples mentioned solves all the problems. And notice that there is no such thing as a datetime library anywhere that has consistent, uniform and order-of-magnitude improvements such that they merit dropping the stdlib.
The stdlib is _good enough_. You can build perfectly good business systems that work reasonably well and as long as you have a couple basic ideas down about how you lay down datetime usage you'll be mostly fine. I've been working with Python for over 15 years and any time I picked a different datetime library it was jut an additional maintenance burden.
> But now you're stuck with it forever.
You're "stuck" with whatever datetime library you choose. One day your oh-so-great datetime library is going to be legacy and you'll be equally bamboozled in migrating to something better.
I've heard this argument about SQLAlchemy, the Django ORM, and various other packages. The people that chose to go somewhere less maintained are now stuck in legacy mode too.
> Python is packed full with this shit. Because it wasn't carefully planned and respect wasn't given to decisions that would last forever.
This is pure ignorance. There's not a single language standard library that is absolutely amazing. Yet the batteries-included approach ends up being a far better solution long term when you look at the tradeoffs from an engineering perspective.
> Python has two testing frameworks baked in, neither of which is good.
They are good enough. They have broad support with tons of plugins. They have assertions. They get the basics right and I've had success getting useful tests to pass. This is all that matters; you tests don't become magically better because you decided to use nose or whatever framework of the day you choose.
> Python has historically had shitty HTTP libraries and has had to roll out several versions to fix the old ones because it couldn't break or remove the old ones. Newbies to the language will find those built in and will write new software with the old baggage.
The current python docs recommend requests, and requests is a well-established package that everyone uses and is not at risk of being outdated, as it's been the go-to standard for over a decade. This is fine. If you're a library writer you're better off using urllib3 and avoiding an additional dependency.
> Batteries included is a software smell. It's bad. You can't change the batteries even after they expire.
Try to revive a line of business Node.JS app written 10 years ago with hundreds of outdated dependencies. An equivalent Python app will have a half dozen dependencies at most and if you stuck to the most popular package there's a really high change an upgrade will be smooth an easy. I've done this multiple times; tons of colleagues have had to do this often. Python's decision makes this tremendously easy.
So sorry, if you're aiming for library perfection, you're not aiming for writing maintainable software. Software quality happens on the aggregate, not in choosing the fancies most modern thing.
The issue with that is how to get everyone to agree on how that would work, e.g. what the criteria for this extension would be, what's the policy for future changes, who will maintain all of this, etc etc.
Now instead of seeing millions of lines of inscrutable code in your program bloating binary sizes, you can see it in every program (that doesn't disable stdlib).
In every program that uses a particular feature from the stdlib. Given the same feature, I tend to trust stdlib more than some rando project. And if you don't trust the stdlib, why would you trust the compiler?
Indeed, yes sometimes this brings cruft into the mix.
However I rather have cruft that works everywhere the toolchain is fully implemented, instead of playing whack-a-mole with third party libraries when only some platforms are supported.
I think that the bare bones stdlib is a huge mistake in Rust. I would love to see that rectified. Unfortunately, approximately 5 other people share that view. The Rust community as a whole is very opposed to adding functionality to std.
I mean, the case against it is pretty strong. Many languages with maximalist standard libraries have tons of vestigial code that nobody uses because the ecosystem found better solutions. Yet that code has to be maintained in perpetuity.
The C++ standard library even has this problem for something as basic as formatting (iostreams), and now it has two solutions for the same problem.
That is a serious burden on the maintainers, it creates all kinds of different problems, especially if the functionality of the libraries assumes a certain execution environment. Rust doesn't just target x86 desktops.
Go doesn't just target x86 desktops either
And? Not every project had the same amount of resources.
There is a tradeoff here. Having a large, but badly maintained, standard library with varying platform support is worse than having a smaller, but well maintained, one.
If you look at the numbers, Golang has 2000+ contributors, while Rust has 5000+
Golang's core dev team is something like 30 people.
So Rust does have the resources.
The amount of contributors is a totally meaningless metric.
1. Not every contributor contributes equally. Some contributors work full time on the project, some work a few hours a month.
2. The amount of contributors says nothing about what resources are actually required. Rust is, no doubt, a more complex language than go and is also evolving faster.
3. The amount of contributors says nothing about the amount of contributors maintaining very niche parts of the ecosystem.
Rust has a million ways to solve a specific problem, as it is not opinionated and gets you down to the lowest level if needed. On top of that there's a million ways to encode your types. Then there's a million ways to bind C libraries.
The solution space is basically infinite, and that's a good thing for a systems programming language. It's kind of amazing how far rust reaches into higher level stuff, and I think the way too easy to use package manager and lively crate ecosystem is a big part of that.
Sometimes I wish for a higher-level rust-like language though, opinionated as hell with garbage collector, generic functions without having to specify traits, and D's introspection.
Have you tried Go?
No mention here of binary size (beyond linking out to a ClickHouse blog post on the topic).
The total number of lines of code is relevant, sure, but for most practical purposes, compile times and binary sizes are more important.
I don't know the situation in Rust, but in JS land, there's a pretty clear divide between libraries that are tree-shakable (or if you prefer, amenable to dead code elimination) and those that aren't. If you stick to tree-shakable dependencies your final bundled output will only include what you actually need and can be pretty small.
> The total number of lines of code is relevant, sure, but for most practical purposes, compile times and binary sizes are more important.
Perhaps for most practical purposes, but not for security, which the article's author seems more concerned with:
> Out of curiosity I ran toeki a tool for counting lines of code, and found a staggering 3.6 million lines of rust... How could I ever audit all of that code?
Tree-shaking can't help with that.
Like many compiled languages, Rust does dead code elimination on everything.
Right, but it depends on how the code is written, right?
If you use mostly free functions things will shake out naturally, if you use lots of dynamic dispatch you'll pull in stuff that doesn't get called.
Rust does static dispatch even for non-free functions. Only trait objects are dynamically dispatched, and most people argue they’re under-used in Rust, not overused.
If that does become a problem, there are also techniques like https://github.com/rust-lang/rust/issues/68262 too.
I had the same concerns when I started using Rust, but then I eventually embraced it, for better or worse. Cargo makes it so your build almost never breaks (it's happened maybe twice for the 8 years I've been doing Rust). Plus there are still way less vulnerabilities with Rust projects than non-Rust projects, in spite of the crazy number of dependencies.
If I was to design a Rust 2.0, I'd make it so dependencies need permissions to access IO, or unsafe code, etc.
Totally agree with the core concern—dependency sprawl is no longer just a JS/Python issue, it's now visibly hitting Rust as well.
One thing I've observed while managing a mid-sized Rust codebase: cargo does a decent job with versioning, but the long tail of small, redundant crates (often differing only slightly) can still bloat the tree. The lack of a strong ecosystem-level curation layer makes it hard to know which crates are battle-tested vs. weekend hacks.
Maybe it’s time the community seriously considers optional “trust scores” or soft standards (similar to crates.io keywords, but more structured) to guide adoption. Not a gatekeeping mechanism—just more context at decision time.
Excuse me for not having much to add to the discussion but two interesting references for people to check out, if so inclined of course:
a) Ginger Bill (the Odin language creator, no affiliation) stated on a podcast that Odin will never have an official pkg manager, since what they're, in his opinion, mainly automating is dependency hell, and this being one of the main reasons for rising software complexity and lower software quality; see https://www.youtube.com/watch?v=fYUruq352yE&t=11m26s (timestamped to the correct position) (they mention Rust explicitly as an example)
b) another programmer rather seriously worried about software quality/complexity is Jonathan Blow, who's talk "Preventing the Collapse of Civilization" is worth watching in my opinion: https://www.youtube.com/watch?v=ZSRHeXYDLko (it's not talking about package managers specifically, but is on topic regarding software complexity/quality as a whole)
Addendum: And sorry, I feel like almost everyone knows this xkcd by now, but since no one so far seems to have posted it; "obligatory xkcd reference": https://imgs.xkcd.com/comics/dependency_2x.png
> a) Ginger Bill (the Odin language creator, no affiliation) stated on a podcast that Odin will never have an official pkg manager
The cognitive dissonance for how one can believe that Rust preventing you from derefercing freed memory at compile time is overzealous nannying by the language authors -- while at the same time deliberately making code reuse harder for users because they could make engineering decisions he doesn't like is staggering.
> a)... Odin will never have an official pkg manager
Perhaps this explains why Odin has found such widespread usage and popularity. /s
> b)... Jonathan Blow, who's talk "Preventing the Collapse of Civilization"
With such a grandiose title, before I first watched I thought it must be satire. Turns out, it is food for the credulous. I believe Jonathan Blow is less "seriously worried about software quality/complexity" than he is about marketing himself as the "last great hope". At least Blow's software has found success within its domain. However, I fear Blow's problem is the problem of all intellectuals: “An intellectual is a person knowledgeable in one field who speaks out only in others.” Blow has plenty of opinions about software outside his domain, but IMHO very little curiosity about why his domain may be different than your own.
My own opinion is there is little evidence to show this is a software quality problem, and any assertion that is the case needs to compare the Rust model against the putatively "better" alternatives. Complex software, which requires many people to create, sometimes across great distances of time and space, will necessarily have and require dependencies.
Can someone show me a material quality difference between ffmpeg, VLC, and Samba dependencies and any sufficiently complex Rust program (even which perhaps has many more dependencies)?
Now, large software dependency graphs may very well be a security problem, but it is a problem widely shared with all other software.Odin is pragmatic & opinionated in its language design and goal. Maybe the lack of a package manager is the basis for you to disregard a programming language, for plenty of others (and likely more Odin's target group) it's the least of their concerns when choosing a language.
> What an unnecessarily snark and dismissive comment to make about someone's work.
The snark was intended, however any dismissiveness concerning Ginger Bill's effort was not. However, when you make a decision like "Odin will never have a package manager", you may be choosing to condemn your project to niche status, in this day and age. Now, niche status is fine, but it definitionally comes with a limited audience. Like "this game will only ever be a text based roguelike."
I see a lot of concern like this about dependencies, mostly in node. I'm sure it's an issue I'm just not convinced it's as big of a problem as people say. We have scanners that can help keep your dependencies secure automatically. If you take a dependency and it goes unmaintained is it really that much worse than the relevant code in your own codebase going unmaintained?
Vendoring is a step in the right direction, you’ve constrained one side of the equation.
But you’re still open to typo squatting and similar issues like crates falling unmaintained - the article mentions the now famous dotenv vs. dotenvy issue (is this solvable with a more mature governance model for the crates ecosystem? At this point dotenv should probably be reclaimed). So after vendoring a baseline set of dependencies, you need to perform comprehensive auditing.
Maybe you can leverage LLMs to make that blob of vendored deps smaller / cheaper to own. Maybe you can distill out only the functionality you need (but at what cost, now you might struggle to backport fixes published upstream). Maybe LLMs can help with the auditing process itself.
You need a stream of notifications of upstream fixes to those vendored deps. Unfortunately in the real world the decision making will be harder than “ooh, there’s a sec fix, I should apply that”.
I always wonder why someone like JFrog don’t expand their offering to provide “trusted dependencies” or something similar. I.e. you pay to outsource that dependency governance and auditing. Xray scanning in the current product is a baby step toward the comprehensiveness I’m suggesting.
Taking a step back though, I’d be really careful not to throw the baby out with the bath water here. Rust has a fairly unique capability to compose work product from across unrelated developers thanks to its type system implementation (think about what happens with a C library, who’s responsible for freeing the memory, you or me?). Composition at scale is rusts super power, at least in terms of the productivity equation for large enterprises - in this context memory safety is not the sales pitch since they already have Java or whatever.
Existing discussion in https://news.ycombinator.com/item?id=43930640
I agree that there are too many dependencies in Rust. I support the idea of adding some of the more popular crates to std. Many applications use something like tracing, tracing-subscriber, and basic server/client functionality. It would be great to have simple, minimal-feature implementations of these in std — similar to how Go does it. If someone needs a more complex system, they can still use an external crate, but having basic building blocks in std would really help.
author here, thanks for reading and all of your thoughts! Here's another older HN thread with some interesting comments. https://news.ycombinator.com/item?id=43930640
So asking HN: whatever happened with OSGi? Does that architecture solve the problem, and if no, why not?
https://docs.osgi.org/specification/osgi.core/7.0.0/framewor...
"How OSGi Changed My Life" (2008) https://queue.acm.org/detail.cfm?id=1348594
This is my first encounter with OSGi. It seems to me that the "Lego hypothesis" reflects an increasing justified approach. The ACM Queue article mentions hot plugging and dependency injection, and a comment[0] in this thread brings up Sans IO. This also ties into capabilities, as a security measure but also an approach to modularity. The common thread is that programs should be written with a strong sense of boundaries: both what is included and what is not included is vital, and the boundary must allow the inside to communicate with the outside. Push dependencies to the boundary and create interfaces from them. The general principles for trivially pluggable components are all out there now. More efforts like OSGi will be needed to principles into practice.
[0] https://news.ycombinator.com/item?id=43944511
I used to be firmly in the component oriented camp. The reality of the matter is that the conceptual (mental) model doesn't really represent the reality of composing with reusable components.
All Lego components have the same simple standard mechanism: friction coupling using concave and convex surface elements of the component. Unix pipes are the closest thing we have to a Lego like approach and there the model of "hooking pipes of bytes from sources to sinks" actually represents what happens with the software.
With components and APIs, unless we resort to some universal baseline (such as a small finite semantic API like REST's "verbs") that basically can marshall and unmarshall any arbitrary function call ('do (func, context, in-args, out-args, out-err)' the Lego metaphor break down very quickly.
The second issue are the modalities of 'interactions' between components. So this is my first encounter with "Sans-IO" (/g) but this is just addressing the interactions issue with a fiat 'no inter-actions by components'. So Lego for software: great overall expression of desired simplicity, but not remotely effective as a generative concept and imo even possibly detrimental (as it over simplifies the problem).
Now we have 2 different pieces of software tech that somewhat have managed to arrive at component orientation: using a finite set of predefined components to build general software. One is GUI components, where a small set of visual components and operational constructs ("user-events", etc.) with structural and behavioral semantics are used to create arbitrary visual interfaces for any ~domain. The other is WWW where (REST verbs of) HTTP also provide a small finite set of 'components' (here architectural) to create arbitrary services. With both, there is the tedious and painful process of mapping domain semantics to structural components.
So we can get reusable component oriented software (ecosystems) but we need to understand (per lessons of GUIs and WebApps) that a great deal of (semantic) glue code and infrastructure is necessary, just as a lot of wiring (for GUIs) and code frameworks (for WebApps) are necessary. That is what something like OSGi brings to the table.
This then leads to the question of component boundary and granulity. With things like DCOM and JEE you have fine grained components aggregated in process boundaries. The current approach is identifying process boundary as component boundary (docker, k8, microservices) (and doing away with 'application servers' in the process).
> With components and APIs, unless we resort to some universal baseline (such as a small finite semantic API like REST's "verbs") that basically can marshall and unmarshall any arbitrary function call ('do (func, context, in-args, out-args, out-err)' the Lego metaphor break down very quickly.
I agree that this is generally what happens, and I would like to suggest that there is a better, harder road we should be taking. The work of programming may be said to be translation, and we see that everywhere: as you say, mapping domain semantics (what-to-do) to structural components (how-to-do-it), and compilers, like RPC stub generation. So while a few verbs along the lines of RPC/REST/one-sided async IPC are the domain of the machine, we programmers don't work well with that. It's hard to agree, though, and that's not something I can sidestep. I want us to tackle the problem of standardization head-on. APIs should be easy to define and easy to standardize, so that we can use richly typed APIs with all the benefits that come from them. There's the old dream of making programs compose like procedures do. It can be done, if we address our social problems.
> The second issue are the modalities of 'interactions' between components. So this is my first encounter with "Sans-IO" (/g) but this is just addressing the interactions issue with a fiat 'no inter-actions by components'. So Lego for software: great overall expression of desired simplicity, but not remotely effective as a generative concept and imo even possibly detrimental (as it over simplifies the problem).
I'm not sure what you mean, so I may be going off on a tangent, but Sans IO, capabilities, dependency injection etc. are more about writing a single component than any inter-component code. The part that lacks IO and the part that does IO are still bundled (e.g. with component-as-process). There is a more extensive mode, where whoever controls a local subsystem of components decides where to put the IO manager.
> Now we have 2 different pieces of software tech that somewhat have managed to arrive at component orientation: using a finite set of predefined components to build general software.
> So we can get reusable component oriented software (ecosystems) but we need to understand (per lessons of GUIs and WebApps) that a great deal of (semantic) glue code and infrastructure is necessary, just as a lot of wiring (for GUIs) and code frameworks (for WebApps) are necessary.
I agree, which is why I want us to separate the baseline components from more powerful abstractions, leaving the former for the machine (the framework) and the latter for us. Does the limited scope of HTTP by itself mean we shouldn't be able to provide more semantically appropriate interfaces for services? The real issue is that those interfaces are hard to standardize, not that people don't make them.
> I agree, which is why I want us to separate the baseline components from more powerful abstractions, leaving the former for the machine (the framework) and the latter for us. Does the limited scope of HTTP by itself mean we shouldn't be able to provide more semantically appropriate interfaces for services? The real issue is that those interfaces are hard to standardize, not that people don't make them.
We're likely in general agreement in terms of technical analysis. Let's focus on the concrete metric of 'economy' and hand-wavy metric of 'natural order'.
Re the latter, consider the thought that 'maybe the reason it is so difficult to standardize interfaces is because it is a false utopia?'
Re the former, the actual critical metric is 'is it more ecomical to create disposable and ad-hoc systems, or, to amortize the cost of a very "hard" task across 1 or 2 generations of software systems and workers?'
Now the industry voted with its wallets and blog propaganda of 'fresh engineers' with no skin in the component oriented approach in early '00s. That entire backlash that included "noSQL" movement was in fact, historically, a shift mainly motivated by economic considerations aided by a few black swans, like Linux and containarization. But now, the 'cost' of the complexity of assembly, deployment, and orchestration of a system based on that approach is causing information overload on the workers. And now we have generative AI, which seems to further tip the economic balance in favor of the late stage ad-hoc approach to putting a running system together.
As to why I used 'natural order'. The best "Lego like" system out there is organic chemistry. The (Alan) Kay vision of building code like nature builds organisms is of course hugely appealing. I arrived at the same notions independently when younger (post architecture school) but what I missed then and later realized is that the 'natural order' works because of the stupendous scales involved and the number of layers! Sure, maybe we can get software to be "organic" but it will naturally (pi) present the same perplexity to us as do biological systems. Do we actually fully understand how our bodies work?
(Just picking old professional scabs here)
I see two points: safety - bigger supply chain attack surface, and code bloat/compiler performance. The later has been discussed in numerous posts here (the whole idea of a linker from the start was to get rid of unused functions, so not a big problem imo). The safety is a serious and legit consideration, but we also rely on Linux and build tools to build things. How do you know the compiler that was used to build Linux hasn't been compromised, perhaps several generations ago, and now your Linux has a backdoor that is not in Linux source code? There was a research paper on this IIRC. We trust the ecosystem to validate each tool we use. We just have to do the same with our own projects - only use what's relevant, and we should do dependency hygiene to check if it is coming from a reputable source...
Dependency and build management is a fascinating and still unsolved problem in software engineering (in some sense it is the central problem).
I am wondering if there is a good modern reference that provides a conceptual overview or comparative study of the various techniques that have been attempted.
It is a hard subject to define as it cuts through several layers of the stack (all the way down to the compiler system interface layer), and most book focus on one language or build technology rather than providing a more conceptual treatment of the techniques used.
Out of curiosity I ran toeki a tool for counting lines of code, and found a staggering 3.6 million lines of rust. Removing the vendored packages reduces this to 11136 lines of rust.
Tokei hasn't had a stable release in over 4 years and misreports lines of code in some instances. The author in the past has basically said they would need to be paid to backport one line fixes with no merge conflicts that fix real accuracy issues in their software... Bad look in my book.
https://github.com/XAMPPRocky/tokei/issues/875
Everyone is in such a rush to get their project out the door, no one has time to generate a key and properly code sign releases and begin developing a more secure chain. Now we have JS package "whatever code" ecosystem but for Rust. As if we haven't watched NPM get hacked many times over the last decade or so.
> Everyone is in such a rush to get their project out the door
This is the cause of so many issues.
And its not like we're at war or trying to cure the next pandemic, we're writing CRUD apps and trying to convince people to click on adds for crap they don't need.
> As if we haven't watched NPM get hacked many times over the last decade or so.
When has this happened? The only one I remember is the event-stream thing, and that was what, over five years ago? Doesn't seem all that common from what I can see?
3.6M lines of code seems so much that it sets off my "are you sure that's counted right?" alarm.
I'm not very familiar with Rust, but all of Go is 1.6M lines of Go code. This includes the compiler, stdlib, tests for it all: the lot.
Not that I doubt the sincerity of the author of course, but maybe some irrelevant things are counted? Or things are counted more than once? Or the download tool does the wrong thing? Or there's tons of generated code (syscalls?)? Or ... something? I just find it hard to believe that some dependencies for web stuff in Rust is twice all of Go.
This other person wrote their own async runtime and web server from scratch to reduce bloat, and their rust app still vendors 2 million lines of code:
https://news.ycombinator.com/item?id=43942055
Sounds like a lot is just (generated) syscall stuff? Either way, that doesn't really explain anything.
That's one part of Rust bloat and slow compilation times.
But there's still a ton of dependency code there.
> This whole fiasco led me tho think .... do I even need this crate at all? 35 lines later I had the parts of dotenv I needed.
"A little copying is better than a little dependency." - grab the parts that you need and then include the library only in a test to ensure alignment down the line, an idea I liked a lot.
https://www.youtube.com/watch?v=PAAkCSZUG1c&t=9m28s
I think the main problem is that you should be able to run dependencies inside their own sandbox, and the language focuses only on memory safety within a monolithic program.
the problem is if you put library dependencies in their own sandbox you have a different kind of interface (much more limited) for libraries
like e.g. if we look at sandbox boundaries we have:
- some in language permission enforcement (e.g. Java Security Manage) -- this approach turned out to be a very bad idea
- process boundaries, i.e. take the boundary the OS enforces and lock it down more (e.g. by stuff like pledge, cgroups etc.) -- this approach turned out okayish
- VM boundaries (e.g. firecracker VMs) -- tourned out well
- emulation boundaries (e.g. WASM) -- mixed history, can turn out well especially if combined with worker processes which lock themself down
but what that means in practice is that wanting the reliably sand box library dependencies will most likely lead to more or less IPC boundaries between the caller and the libary
what that means is practice it's unsuited for a lot of thing
e.g. for most utility lib it's very unsuited
e.g. for a lot (but not all) data structure libs its unsuited and might be a huge issue
e.g. you can apply it to a web-server, but then you are basically reinventing CGI, AGI which okay but can quite compete with perf.
e.g. but you can't apply it to some fundamental runtime engine (e.g. tokio), worse you now might have one copy of the engine running per sandbox... (but you can apply it to some sub-part of tokio internals)
People have tried this a lot in various ways.
But so far this always died off in the long run.
Would be nice if the latest push based around WASM would have some long term success.
> the problem is if you put library dependencies in their own sandbox you have a different kind of interface (much more limited) for libraries
Nobody said it would be easy. As an analogy, the borrow checker makes working with memory much more limited, yet some people like it because it makes things safer.
Thanks! This is a very detailed explanation of why existing sandboxing techniques will not work as expected for dependencies (wrt to functionality or performance).
They should take a look at OPAM (OCaml’s package manager). There was a really impressive talk at the OCaml Workshop at POPL or ICFP a couple of years ago about how it works. Basically, they have a huge CI infrastructure and keep all versions of every package ever published. So, once you’ve found the right set of dependencies for your project, you can be sure the exact versions will always be available via OPAM.
As a fellow rust developer, I love our dependencies but I put a lot of effort into pruning the ones I want to use. If I see a crate using too many I might contribute to it or find a replacement.
If you want to use dependencies, I wouldn't be surprised when you realise they also want to use dependencies. But you can put your money/time in the right places. Invest in the dependencies that do things well.
A thought experiment for this writer: imagine if Tokio (and all its dependencies) were moved into the Rust standard library, so that it was more like Go. Would that make them more comfortable depending on it (not that they'd have a choice any more)? If so, why?
This is a general problem: devs pulling in libraries instead of writing a few lines of code. Those libraries pull in more dependencies that have even more dependencies.
There's no good solution...
LLM coding assistants are a partial solution. Recently I typed
and a few tab-completes later it had filled in the body of the code with a correct color conversion routine. So that saved me searching for and pulling in some big-ass color library.Most of lodash.js can be avoided with LLMs too. Lodash's loops are easier to remember than Javascript's syntax, but if your LLM just writes the foo.forEach((value, key) => {...}) for you, you can skip the syntactic sugar library.
Why, offer a patch that introduces the few lines and removes a dependency.
> do I even need this crate at all? 35 lines later I had the parts of dotenv I needed.
I'm not saying you copy-pasted those 35 lines from dotenvy, but for the sake of argument let's say you did: now you can't automatically benefit from dotenvy patching some security issue in those lines.
Can't benefit from them patching a security issue, but don't suffer from
what's the trade-off now?You forgot to add: legal council asking why you used a random package that triggered a contractually obligated security audit for your biggest client.
What security issue? It's just read file by line, split by =, and return or call setenv. This is not OpenSSL we're talking about.
To benefit you have to actually trust the current and future maintainers of the package, its dependencies, the dependencies of its dependencies, etc. You can also automatically get breached in a supply chain attack, so it's a tradeoff
If you REALLY need such update, you can easily subscribe to updates from the mainstream project (in whatever way it allows) and patch your version when that rare situation occurs.
I'm using fewer and fewer libs, thanks to ChatGPT. Just write me whichever function I currently need
This feels like a pointless issue. At best perhaps establish a good trust model, than blaming the tooling.
Has anyone had good luck with cargo vet?
It lets security professionals cryptographically vouch for the trustworthiness of rust packages.
What about cargo vet?
It lets security professionals audit rust packages, and cryptographically attest to their trustworthiness.
A large removable standard library seems to be the optimal solution. It is there by default for everyone, but if needed for embedded scenario it can be removed, leaving only the core language features.
> What's the solution?
A proper 'batteries included' standard library in the language and discouraging using too many libraries in a project.
The same mistakes from the Javascript community are being repeated in front of us for Cargo (and any other project that uses too many libraries).
When I am compiling Rust applications, I must admit I'm always rather bemused at the number of dependencies pulled. Even what I'd have thought to be simple tools reach easily about 200 dependent packages. It's nightmarish. One way this becomes particularly apparent is if you're trying to create a reproducible package for Guix or Nix. You end up having to manually specify a package for every different Rust library because of how those system require reproducible builds. The process of writing Guix package for software has been extremely illuminating for me, as to just how deeply nested certain technologies are vs. others. I'd be willing to bet it's a good metric for what sticks around. If you've got 200 dependencies, I don't think your software is gonna last the test of time. It seems a recipe for endless churn.
Python is in the same spot now you can't easily install packages globally. I don't have the hard drive space to develop multiple projects anymore. Every single project takes up multiple gigs in just dependencies.
[dead]
[dead]
True
"Not thinking about package management careful makes me sloppy."
Isn't the point of a memory safe language to allow programmers to be sloppy without repercussions, i.e., to not think about managing memory and even to not understand how memory works.
Would managing dependencies be any different. Does Rust allow programmers to avoid thinking carefully about selecting dependencies.
> Isn't the point of a memory safe language to allow programmers to be sloppy without repercussions, i.e., to not think about managing memory and even to not understand how memory works
No. The point is even the best programmers of unsafe languages regularly introduce both simple and subtle bugs into codebases while being careful about handling memory correctly, and therefore we should use languages that don't even allow those bugs for most every use case. Using these languages still allows crap programmers to waste GBs of correctly allocated and handled memory, and good programmers to write tight, resouce-sipping code.
Dependencies are orthogonal to this.
If careful programmers who can manage memory should use the same language as careless ones who cannot, then does this mean both should also automatically use third party libraries by default.
Are there systems languages that provide memory management but do not default to using third party libraries. If yes, then do these languages make it easier for programmers to avoid dependencies.
No, the point is to stop you from being sloppy. The code won't compile if you're sloppy with memory management.
You can be _relatively_ sure that you're not introducing memory unsafety by adding a dependency, but you can't be sure that it isn't malware unless you audit it.
Ins't the point of protective gear to allow people to be sloppy withouy reprecurssions?
> to be sloppy without repercussions
It's the difference between a wet mess and a dry one. Rust creates dry messes. It's still a mess.
> when checking a rust security advisory mentioning that dotenv is unmaintained
This is a problem with all languages and actually an area where Rust shines (due to editions). Your pulled in packages will compile as they previously did. This is not true for garbage collected languages (pun intended).
> Out of curiosity I ran toeki a tool for counting lines of code, and found a staggering 3.6 million lines of rust .... How could I ever audit all of that code?
Again, another area where Rust shines. You can audit and most importantly modify the code. This is not that easy if you were using Nodejs where the runtimes are behind node/v8 or whatever. You compile these things (including TLS) yourself and have full control over them. That's why Tokio is huge.
> This is not true for garbage collected languages
JavaScript is backwards compatible going back effectively forever, as is Java. Rust's unique system is having a way to make breaking changes to the language without breaking old code, not that they prioritize supporting old code indefinitely.
The libraries are a different story—you're likely to have things break under you that rely on older versions of libraries when you update—but I don't see Rust actually having solved that.
> You can audit and most importantly modify the code. This is not that easy if you were using Nodejs where the runtimes are behind node/v8 or whatever.
Node and V8 are open source, which makes the code just as auditable and modifiable as the 3.6 million lines of Rust. Which is to say, both are equally unapproachable.
> The libraries are a different story—you're likely to have things break under you that rely on older versions of libraries when you update—but I don't see Rust actually having solved that.
No language can fix that. However, I've lost count of the times my Python/JavaScript interpretation fails because of something in one of the dependencies. Usually, it's not a JS/Python problem but rather has to do with a Node/Python version update. It always boils down to the "core" issue which is the runtime. That's why I like that Rust give me a "fixed" runtime that I download/compile/package with my program.
> Node and V8 are open source, which makes the code just as auditable and modifiable as the 3.6 million lines of Rust. Which is to say, both are equally unapproachable.
I've recently patched a weird bug under Tokio/Otel and can't imagine doing that with Node/V8 without it being a major hassle. It is relatively straightforward in Rust though requires maintaining your own fork of only the dependency/branch in question.
>Your pulled in packages will compile as they previously did. This is not true for garbage collected languages (pun intended).
What do you mean?