SmackerNews

Fork() is evil; vfork() is goodness; afork() would be better; clone() is stupid

379 points · 338 comments · 4 years ago · __s

gist.github.com

infogulch4 years ago
The dense fog lifts, tree branches part, a ray of light beams down on a pedestal revealing the hidden intentions of the ancients. A plaque states "The operational semantics of the most basic primitives of your operating system are designed to simplify the implementation of shells." You hesitantly lift your eyes to the item presented upon the pedestal, take a pause in respect, then turn away slumped and disappointed but not entirely surprised. As you walk you shake your head trying to evict the after image of a beam of light illuminating a turd.
evmar4 years ago
In Ninja, which needs to spawn a lot of subprocesses but it otherwise not especially large in memory and which doesn't use threads, we moved from fork to posix_spawn (which is the "I want fork+exec immediately, please do the smartest thing you can" wrapper) because it performed better on OS X and Solaris:
https://github.com/ninja-build/ninja/commit/89587196705f54af...
mattgreenrocks4 years ago
Long ago, I, like many Unix fans, thought that fork(2) and the fork-exec process spawning model were the greatest thing, and the Windows sucked for only having exec() and _spawn(), the last being a Windows-ism.
I appreciate this quite a bit. Vocal Unix proponents tend to believe that anything Unix does is automatically better than Windows, sometimes without even knowing what the Windows analogue is. Programming in both is necessary to have an informed opinion on this subject.
The one thing I miss most on Unix: the unified model of HANDLEs that enables you to WaitOnMultipleObjects() with almost any system primitive you could want, such as an event with a socket (blocking I/O + a shutdown notification) in one call. On Unix, a flavor of select() tends to be the base primitive for waiting on things to happen, which means you end up writing adapter code for file descriptors to other resources, or need something like eventfd.
Things I don't miss from Windows at all: wchar_t everywhere. :)
cryptonector4 years ago
Well, I'm surprised to see this on the front page, let alone as #1. Ask me anything.
EDIT: Also, don't miss @NobodyXu's comment on my gist, and don't miss @NobodyXu's aspawn[1].
```
  [0] https://gist.github.com/nicowilliams/a8a07b0fc75df05f684c23c18d7db234?permalink_comment_id=3467980#gistcomment-3467980 
  [1] https://github.com/NobodyXu/aspawn/
```
monocasa4 years ago
Hard disagree to most of this.
fork(2) makes a lot more sense when you realize its heritage. It came from a land before Unix supported full MMUs. In this model, to still have per process address spaces and preemptive multitasking on what was essentially a PC-DOS level of hardware, the kernel would checkpoint the memory for a process, slurp it all out to dectape or some such, and load in the memory for whatever the scheduler wanted to run next. It's simplicity of being process checkpoint based wasn't a reaction to windows style calls (which wouldn't exist for almost a couple decades), but instead mainframe process spawning abominations like JCL. The idea "you probably want most of what you have so force a checkpoint, copy the checkpoint into a new slot, and continue separately from both checkpoints" was soooo much better than JCL and it's tomes of incantations to do just about anything.
vfork(2) is an abomination. Even when the child returns, the parent now has a heavily modified stack if the child didn't immediately exec(). All of those bugs that causes are super fun to chase, lemme tell you. AFAIC, about the only valid use for vfork now is nommu systems where fork() incredibly expensive compared to what is generally expected.
clone(2) is great. Start from a checkpoint like fork, but instead of semantically copying everything, optionally share or not based on a bitmask. Share a tgid, virtual address space, and FD table? You just made a thread. Share nothing? You just made a process. It's the most 'mechanism, not policy' way I've seen to do context creation outside of maybe the l4 variants and the exokernels. This isn't an old holdover, this is how threads work today, processes spawned that happen to share resources. Modern archs on linux don't even have a fork(2) syscall; it all happens through clone(2). Even vfork is clone set to share virtual address space and nothing else that fork wouldn't share. Namespaces are a way to opt into not sharing resources that normally fork would share.
And I don't see what afork gets you that clone doesn't, except afork isn't as general.
mark_undoio4 years ago
The code I currently work on actually has a use of `clone` with the `CLONE_VM` flag to create something that isn't a thread. Since `CLONE_VM` will share the entire address space with the child (you know, like a thread does!) a very reasonable response would be "WAT?!"
What led us here was a need to create an additional thread within an existing process's address space but in a way that was non-disruptive - to the rest of the process it shouldn't really appear to exist.
We achieved this by using `CLONE_VM` (and a handful of other flags) to give the new "thread-like" entity access to the whole address space. But, we omitted `CLONE_THREAD`, as if we were making a new process. The new "thread-like" entity would not technically be part of the same thread group but would live in the same address space.
We also used two chained `clone()` calls (with the intermediate exiting, like when you daemonise) so that the new "thread-like" wouldn't be a child of the original process.
All this existed before I joined, it's just really cool that it works. I've never encountered a such a non-standard use of clone before but it was the right tool for this particular job!
Ericson23144 years ago
This stuff is still all confused
Read http://catern.com/rsys21.pdf
What you want is:
1. create "embryonic" unscheduled process
2. Set it up from the parent process, it just lies on the operating table passively.
3. Submit it to the scheduler.
This is just....obviously correct. Totally flexible. Totally efficient. Hell, if you really want to fork anything, fork those embryonic process which have no active threads! Much safer and easier to understand!
I did not write the paper above, but I did write
https://lore.kernel.org/lkml/f8457e20-c3cc-6e56-96a4-3090d7d...
https://lists.freebsd.org/archives/freebsd-arch/2022-January...
I hope I or someone else will have time to make it happen!
londons_explore4 years ago
I was always disappointed by the performance of fork()/clone().
CompSci class told me it was a very cheap operation, because all the actual memory is copy-on-write, so its a great way to do all kinds of things.
But the reality is that duplicating huge page tables, and hundreds of file handles is very slow. Like 10's of milliseconds slow for a big process.
And then the process runs slowly for a long time after that because every memory access ends up causing lots of faults and page copying.
I think my CompSci class lied to me... it might seem cheap and a neat thing to do, but the reality is there are very few usecases where it makes sense.
scottlamb4 years ago
clone() is stupid ... the clone(2) design, or its maintainers, encourages a proliferation of flags, which means one must constantly pay attention to the possible need to add new flags at existing call sites.
IMHO a bigger problem [2] in practice with clone is that (according to glibc maintainers) once your program calls it, you can't call any glibc function anymore. [1] Essentially the raw syscall is a tool for the libc implementation to use. The libc implementation hasn't provided a wrapper for programs to use which maintains the libc's internal invariants about things like (IIUC) thread-local storage for errno.
The author's aforkx implementation is something that glibc maintainers could (and maybe should) provide, but my understanding is that you can get in trouble by implementing it yourself.
[1] https://github.com/rust-lang/rust/issues/89522#issuecomment-...
[2] editing to add: or at least a more concrete expression of the problem. Wouldn't surprise me if they haven't provided this wrapper in part because the proliferation the author mentioned makes it difficult for them to do so.
tych04 years ago
The problem with this argument is that the set of programs that just fork() and then exec() is fairly small. Sure, shells are small and do this, but then the article argues that shells are a good use of fork().
In larger programs, you're forking because you need to diverge the work that's going to be done and probably where it's going to be done (maybe you want to create a new pid ns, you need a separate mm because you're going to allocate a bunch, whatever). Maybe the argument is that programs should never do this? I don't buy that. Then there's a lot of string-slinging through exec().
ismaildonmez4 years ago
Microsoft Research has a paper about the very same issue (2019): https://www.microsoft.com/en-us/research/publication/a-fork-...
bergkvist4 years ago
I have to disagree that fork is evil. fork is great because of copy-on-write. I guess my particular use case is not very typical/common though.
I'm running powerflow simulations on a power grid model (several GB of memory to store the model). Copy-on-write means I can make small modifications to this model and run simulations in parallel. Thanks to fork/copy-on-write, I can run 32 simulations in parallel, each will small modifications without requiring 32 times as much memory.
psanford4 years ago
I saw a bug once where an application would get way slower on MacOS after calling fork(). Not just temporarily either; many syscalls would continue to run slowly from the call to fork() until the process exited.
Looking on Stack Overflow, I see a few reports of this behavior[0][1].
[0]: https://stackoverflow.com/questions/4411840/memory-access-af...
[1]: https://stackoverflow.com/questions/27932330/why-is-tzset-a-...
throwaway9843934 years ago
I don't think containers should be like jails. Containers should be more like chroots than they are now.
Have you ever tried to run a modern X/whatever app with 3D graphics and audio and DBUS and God knows what else in a container and get it to show up on your desktop? It's a fucking nightmare. I spent over a week trying to get 1Password to run in a container. Somebody decided containers had to be "secure", even though they don't actually exist as a single concept and security was never their primary purpose. If instead containers were used only to isolate filesystem dependencies, we could actually pretend containers were like normal applications and treat them with the same lack of security concern that all the rest of our non-containerized programs are.
Firecracker is the correct abstraction for isolation: a micro-VM. That is the model you want if you want to run an app securely (not to mention reliably, as it can come with its own kernel, rather than needing you to run a compatible host kernel).
jph4 years ago
Is it a fair point to implement first with fork() because of memory protection, then optimize by using benchmarks and potentially vfork() for speed? Benchmark areas can look at synchronous locks, copy-on-write memory, stack sharing, etc.
What are the good practices of security tradeoffs of fork() vs. vfork() especially in terms of ease of writing correct code? I'd thought that fork() + exec() tends to favor thinking about clearer separation/isolation. For example I've written small daemons using fork() + exec() because it seems safe and easy to do at the start.
quietbritishjim4 years ago
Apologies if this is a silly question, but it seems like there's a false dichotomy here:
(1) You have separate fork() (etc.) and exec(), so that in the brief window in between you can set all the properties of the new process using APIs that exist anyway for controlling your own process.
(2) You have a single call to spawn a new process, but you have a million different options to control every aspect of the new process.
Why not do it this other way instead? Perhaps a bit late now but seems like in retrospect it would give the API simplicity of fork+exec without any of the complications.
(3) There are two steps to run a new process. The first fully sets up its memory and returns a PID, but doesn't start running it. The second call, unfreeze(), allows it to begin executing code. All the usual APIs that exist anyway for controlling your own process take an extra parameter specifying the PID of a frozen child (or -1 for the current process).
lisper4 years ago
There is something about fork which I have never understood. Maybe someone here can explain it to me.
Why would anyone ever want fork as a primitive? It seems to me that what you really want is a combination of fork and exec because 99% of the time you immediately call exec after fork (at least that's what I do 99% of the time when I use fork). If you know that you're going to call exec immediately after fork, then all the issues of dealing with the (potentially large) address space of the parent just evaporate because the child process is just going to immediately discard it all.
So why is there not a fork-exec combo? And why has it not replaced fork for 99% of use cases?
And as long as I'm asking stupid questions, why would anyone ever use vfork? If the child shares the parent's address space and uses the same stack as the parent, and the parent has to block, how is that different from a function call (other than being more expensive)?
None of this makes sense to me.
JoeAltmaier4 years ago
The whole idea of fork is strange - the design pattern of "child process is executing exactly where the parent process is executing" is foreign to me. Don't we want to direct where the child process is executing? Like, when creating a thread? Why is fork() so conceptually orthogonal to that? Is there a good reason? A historical reason?
I don't find fork() to be obvious or useful or natural. I work hard to never do it.
immibis4 years ago
Another option is to allow the parent to create an empty child process, and then make arbitrary system calls and execute code in the child, like a debugger does. In most cases the last "remote system call" would be exec.
saurik4 years ago
One use case for fork()--which is used extensively on Android--is to build an expensive template process that can then be replicated for later work, which is exactly what people often want for the behavior with virtual machines. I wrote an article on the history of linking and loading optimizations leading up to how Android handles their "zygote" which touches on this behavior.
http://www.cydiasubstrate.com/id/727f62ed-69d3-4956-86b2-bc0...
albertzeyer4 years ago
We had the case that some library we were using (OpenBLAS) used pthread_atfork. Unfortunately, the atfork handler behaved buggy in certain situations involving multiple threads and caused a crash. This was annoying because we basically did not need fork at all but just fork+exec (for various other libraries spawning sub processes), where those atfork handlers would not be relevant.
Our solution was to override pthread_atfork to ignore any functions, and in case this is not enough, also fork itself to just directly do the syscall without calling the atfork handlers.
https://github.com/tensorflow/tensorflow/issues/13802 https://github.com/xianyi/OpenBLAS/issues/240 https://trac.sagemath.org/ticket/22021 https://bugs.python.org/issue31814 https://stackoverflow.com/questions/46845496/ld-preload-and-... https://stackoverflow.com/questions/46810597/forkexec-withou...
lucideer4 years ago
The good/evil/etc. here seem to be defined exclusively around "performance above all else", and - more specifically - performant primitives over performant application architecture.
It strikes me that performance gains associated with sharing address space & stack are similar to many performance gains: trade-offs. So calling them "good" and "evil" when performance is seemingly your sole goal and interest seems a bit forward.
mrob4 years ago
Fork() is the second worst idea in programming, behind null pointers. Fork() is the reason overcommit exists, which is the reason my web browser crashes if I open too many tabs, and the reason the "safe" Rust programming language leaves software vulnerable to DOS attacks if it uses the standard library. It's a clear example of "worse is worse", and we should have switched to the Microsoft Windows model decades ago.
Here's a paper from Microsoft Research supporting this point of view:
https://www.microsoft.com/en-us/research/uploads/prod/2019/0...
mywacaday4 years ago
"I won't bother explaining what fork(2) is -- if you're reading this, I assume you know.", If that applied to everything I looked at from HN I'd read precious little.
phendrenad24 years ago
For those saying to use posix_spawn: What am I supposed to make of the writeup in the posix_spawn manpage though?
"...specified by POSIX to provide a standardized method of creating new processes on machines that lack the capability to support the fork(2) system call. These machines are generally small, embedded systems lacking MMU support"
Is this why no one uses it? It has this gratuitous opinion piece at the beginning that makes people think it's just for embedded systems and my dad's Amiga?
aylmao4 years ago
Meta comment: Github Gist seems to be great for blogging. Yeah, the UI is not very blog-specific, but it has all the useful features, and then some: markdown, comments, hosting, an index of all posts, some measure of popularity (stars), a very detailed edit history, etc.
All without having to pay or setup anything yourself.
throwawaylinux4 years ago
This avfork implementation is poor. You don't want to make your single threaded programs multi-threaded. I don't really get the big benefit of afork over other existing mechanisms other than handwaving about things being evil.
Also,
Linux should have had a thread creation system call -- it would have then saved itself the pain of the first pthread implementation for Linux. Linux should have learned from Solaris/SVR4, where emulation of BSD sockets via libsocket on top of STREAMS proved to be a very long and costly mistake. Emulating one API from another API with impedance mismatches is difficult at best.
Linux does have a thread creation system call. It's clone(2). It literally creates new threads of execution with various properties. It does not "emulate" threads, it is threads.

Some links I found today researching this:

- @famzah'z blog about fork vs vfork vs clone performance:

  https://blog.famzah.net/tag/fork-vfork-popen-clone-performance/

 - A very similar idea to my afork() idea, from 2 years earlier:

  https://developers.redhat.com/blog/2015/08/19/launching-helper-process-under-memory-and-latency-constraints-pthread_create-and-vfork

 - misc

  https://inbox.vuxu.org/tuhs/CAEoi9W6HFL3UcnWkKoqka8Dt16MWskKd6yEJr3HYCcCT9pMTig@mail.gmail.com/T/

  https://bugzilla.redhat.com/show_bug.cgi?id=682922 (see attachments)

kazinator4 years ago
Concurrently running dupe currently on front page: https://smackernews.com/item/30499169 HN
:) :) :)
sys_647384 years ago
The intent of fork() is to start a new process in its own address space. That *fork() variations that run in the SAME address space are confusing. A use case today for fork() might also be sandboxing apps. Certainly I expect browsers use this approach to spawn unique pages. But generally fork() is very specific from my recollection.
ridiculous_fish4 years ago
Amusingly vfork semantics differ across OSes. This program prints 42 in Linux but 1 on Mac: https://godbolt.org/z/jn7Gaf5Me because on Linux they share address space.
userbinator4 years ago
I started with DOS, where spawn() is the norm, so I've always considered the fork()-like behaviour to be unusual yet handy for certain use-cases. Perhaps a system call that offers a combination of the two behaviours should be named spork().
harry84 years ago
Fork with cow is inefficient.
Compared to what? In what dimension? Any numbers on that? Where is the trade-off? To what extent does anyone need to care and on what circumstances?
tiffanyh4 years ago
Slightly off topic, how does Erlang handle this because isn’t it know for having extremely fast & cheap process spawning baked in (with isolation).
switch334 years ago
The problem is clone is more of a start phase after vfork but before fork regardless for github. So it's kind of a bit strange that we call vfork first but that is about templates too.
As for templates they need to be in different languages and in different formats for video games consoles, and so many other formats they port systems and games that sort of work digitally to certain things but not playable to certain things too.
The other problem is that clone is part of syscall interfaces and part of apis and part of a lot of other things too.
[deleted]
pipeline_peak4 years ago
Your idea good
Your idea stupid
I’m not woke by any means, idk what it is about low level programming but calling someone’s idea “stupid” is a really shitty thing to say.
“He chose to take it personally” is the type of lazy, pseudo-stoic argument I have no interest in reading.
Yes I’m having a morning, lol.

news.ycombinator.com/item?id=30502392