379 points · 338 comments · 4 years ago · __s
gist.github.cominfogulch
evmar
https://github.com/ninja-build/ninja/commit/89587196705f54af...
mattgreenrocks
Long ago, I, like many Unix fans, thought that fork(2) and the fork-exec process spawning model were the greatest thing, and the Windows sucked for only having exec() and _spawn(), the last being a Windows-ism.
I appreciate this quite a bit. Vocal Unix proponents tend to believe that anything Unix does is automatically better than Windows, sometimes without even knowing what the Windows analogue is. Programming in both is necessary to have an informed opinion on this subject.
The one thing I miss most on Unix: the unified model of HANDLEs that enables you to WaitOnMultipleObjects() with almost any system primitive you could want, such as an event with a socket (blocking I/O + a shutdown notification) in one call. On Unix, a flavor of select() tends to be the base primitive for waiting on things to happen, which means you end up writing adapter code for file descriptors to other resources, or need something like eventfd.
Things I don't miss from Windows at all: wchar_t everywhere. :)
cryptonector
EDIT: Also, don't miss @NobodyXu's comment on my gist, and don't miss @NobodyXu's aspawn[1].
[0] https://gist.github.com/nicowilliams/a8a07b0fc75df05f684c23c18d7db234?permalink_comment_id=3467980#gistcomment-3467980
[1] https://github.com/NobodyXu/aspawn/monocasa
fork(2) makes a lot more sense when you realize its heritage. It came from a land before Unix supported full MMUs. In this model, to still have per process address spaces and preemptive multitasking on what was essentially a PC-DOS level of hardware, the kernel would checkpoint the memory for a process, slurp it all out to dectape or some such, and load in the memory for whatever the scheduler wanted to run next. It's simplicity of being process checkpoint based wasn't a reaction to windows style calls (which wouldn't exist for almost a couple decades), but instead mainframe process spawning abominations like JCL. The idea "you probably want most of what you have so force a checkpoint, copy the checkpoint into a new slot, and continue separately from both checkpoints" was soooo much better than JCL and it's tomes of incantations to do just about anything.
vfork(2) is an abomination. Even when the child returns, the parent now has a heavily modified stack if the child didn't immediately exec(). All of those bugs that causes are super fun to chase, lemme tell you. AFAIC, about the only valid use for vfork now is nommu systems where fork() incredibly expensive compared to what is generally expected.
clone(2) is great. Start from a checkpoint like fork, but instead of semantically copying everything, optionally share or not based on a bitmask. Share a tgid, virtual address space, and FD table? You just made a thread. Share nothing? You just made a process. It's the most 'mechanism, not policy' way I've seen to do context creation outside of maybe the l4 variants and the exokernels. This isn't an old holdover, this is how threads work today, processes spawned that happen to share resources. Modern archs on linux don't even have a fork(2) syscall; it all happens through clone(2). Even vfork is clone set to share virtual address space and nothing else that fork wouldn't share. Namespaces are a way to opt into not sharing resources that normally fork would share.
And I don't see what afork gets you that clone doesn't, except afork isn't as general.
mark_undoio
What led us here was a need to create an additional thread within an existing process's address space but in a way that was non-disruptive - to the rest of the process it shouldn't really appear to exist.
We achieved this by using `CLONE_VM` (and a handful of other flags) to give the new "thread-like" entity access to the whole address space. But, we omitted `CLONE_THREAD`, as if we were making a new process. The new "thread-like" entity would not technically be part of the same thread group but would live in the same address space.
We also used two chained `clone()` calls (with the intermediate exiting, like when you daemonise) so that the new "thread-like" wouldn't be a child of the original process.
All this existed before I joined, it's just really cool that it works. I've never encountered a such a non-standard use of clone before but it was the right tool for this particular job!
Ericson2314
Read http://catern.com/rsys21.pdf
What you want is:
1. create "embryonic" unscheduled process
2. Set it up from the parent process, it just lies on the operating table passively.
3. Submit it to the scheduler.
This is just....obviously correct. Totally flexible. Totally efficient. Hell, if you really want to fork anything, fork those embryonic process which have no active threads! Much safer and easier to understand!
I did not write the paper above, but I did write
https://lore.kernel.org/lkml/f8457e20-c3cc-6e56-96a4-3090d7d...
https://lists.freebsd.org/archives/freebsd-arch/2022-January...
I hope I or someone else will have time to make it happen!
londons_explore
CompSci class told me it was a very cheap operation, because all the actual memory is copy-on-write, so its a great way to do all kinds of things.
But the reality is that duplicating huge page tables, and hundreds of file handles is very slow. Like 10's of milliseconds slow for a big process.
And then the process runs slowly for a long time after that because every memory access ends up causing lots of faults and page copying.
I think my CompSci class lied to me... it might seem cheap and a neat thing to do, but the reality is there are very few usecases where it makes sense.
scottlamb
clone() is stupid ... the clone(2) design, or its maintainers, encourages a proliferation of flags, which means one must constantly pay attention to the possible need to add new flags at existing call sites.
IMHO a bigger problem [2] in practice with clone is that (according to glibc maintainers) once your program calls it, you can't call any glibc function anymore. [1] Essentially the raw syscall is a tool for the libc implementation to use. The libc implementation hasn't provided a wrapper for programs to use which maintains the libc's internal invariants about things like (IIUC) thread-local storage for errno.
The author's aforkx implementation is something that glibc maintainers could (and maybe should) provide, but my understanding is that you can get in trouble by implementing it yourself.
[1] https://github.com/rust-lang/rust/issues/89522#issuecomment-...
[2] editing to add: or at least a more concrete expression of the problem. Wouldn't surprise me if they haven't provided this wrapper in part because the proliferation the author mentioned makes it difficult for them to do so.
tych0
In larger programs, you're forking because you need to diverge the work that's going to be done and probably where it's going to be done (maybe you want to create a new pid ns, you need a separate mm because you're going to allocate a bunch, whatever). Maybe the argument is that programs should never do this? I don't buy that. Then there's a lot of string-slinging through exec().
ismaildonmez
bergkvist
I'm running powerflow simulations on a power grid model (several GB of memory to store the model). Copy-on-write means I can make small modifications to this model and run simulations in parallel. Thanks to fork/copy-on-write, I can run 32 simulations in parallel, each will small modifications without requiring 32 times as much memory.
psanford
Looking on Stack Overflow, I see a few reports of this behavior[0][1].
[0]: https://stackoverflow.com/questions/4411840/memory-access-af...
[1]: https://stackoverflow.com/questions/27932330/why-is-tzset-a-...
throwaway984393
Have you ever tried to run a modern X/whatever app with 3D graphics and audio and DBUS and God knows what else in a container and get it to show up on your desktop? It's a fucking nightmare. I spent over a week trying to get 1Password to run in a container. Somebody decided containers had to be "secure", even though they don't actually exist as a single concept and security was never their primary purpose. If instead containers were used only to isolate filesystem dependencies, we could actually pretend containers were like normal applications and treat them with the same lack of security concern that all the rest of our non-containerized programs are.
Firecracker is the correct abstraction for isolation: a micro-VM. That is the model you want if you want to run an app securely (not to mention reliably, as it can come with its own kernel, rather than needing you to run a compatible host kernel).
jph
What are the good practices of security tradeoffs of fork() vs. vfork() especially in terms of ease of writing correct code? I'd thought that fork() + exec() tends to favor thinking about clearer separation/isolation. For example I've written small daemons using fork() + exec() because it seems safe and easy to do at the start.
quietbritishjim
(1) You have separate fork() (etc.) and exec(), so that in the brief window in between you can set all the properties of the new process using APIs that exist anyway for controlling your own process.
(2) You have a single call to spawn a new process, but you have a million different options to control every aspect of the new process.
Why not do it this other way instead? Perhaps a bit late now but seems like in retrospect it would give the API simplicity of fork+exec without any of the complications.
(3) There are two steps to run a new process. The first fully sets up its memory and returns a PID, but doesn't start running it. The second call, unfreeze(), allows it to begin executing code. All the usual APIs that exist anyway for controlling your own process take an extra parameter specifying the PID of a frozen child (or -1 for the current process).
lisper
Why would anyone ever want fork as a primitive? It seems to me that what you really want is a combination of fork and exec because 99% of the time you immediately call exec after fork (at least that's what I do 99% of the time when I use fork). If you know that you're going to call exec immediately after fork, then all the issues of dealing with the (potentially large) address space of the parent just evaporate because the child process is just going to immediately discard it all.
So why is there not a fork-exec combo? And why has it not replaced fork for 99% of use cases?
And as long as I'm asking stupid questions, why would anyone ever use vfork? If the child shares the parent's address space and uses the same stack as the parent, and the parent has to block, how is that different from a function call (other than being more expensive)?
None of this makes sense to me.
JoeAltmaier
I don't find fork() to be obvious or useful or natural. I work hard to never do it.
immibis
saurik
http://www.cydiasubstrate.com/id/727f62ed-69d3-4956-86b2-bc0...
albertzeyer
Our solution was to override pthread_atfork to ignore any functions, and in case this is not enough, also fork itself to just directly do the syscall without calling the atfork handlers.
https://github.com/tensorflow/tensorflow/issues/13802 https://github.com/xianyi/OpenBLAS/issues/240 https://trac.sagemath.org/ticket/22021 https://bugs.python.org/issue31814 https://stackoverflow.com/questions/46845496/ld-preload-and-... https://stackoverflow.com/questions/46810597/forkexec-withou...
lucideer
It strikes me that performance gains associated with sharing address space & stack are similar to many performance gains: trade-offs. So calling them "good" and "evil" when performance is seemingly your sole goal and interest seems a bit forward.
mrob
Here's a paper from Microsoft Research supporting this point of view:
https://www.microsoft.com/en-us/research/uploads/prod/2019/0...
mywacaday
phendrenad2
"...specified by POSIX to provide a standardized method of creating new processes on machines that lack the capability to support the fork(2) system call. These machines are generally small, embedded systems lacking MMU support"
Is this why no one uses it? It has this gratuitous opinion piece at the beginning that makes people think it's just for embedded systems and my dad's Amiga?
aylmao
All without having to pay or setup anything yourself.
throwawaylinux
Also,
Linux should have had a thread creation system call -- it would have then saved itself the pain of the first pthread implementation for Linux. Linux should have learned from Solaris/SVR4, where emulation of BSD sockets via libsocket on top of STREAMS proved to be a very long and costly mistake. Emulating one API from another API with impedance mismatches is difficult at best.
Linux does have a thread creation system call. It's clone(2). It literally creates new threads of execution with various properties. It does not "emulate" threads, it is threads.
cryptonector
- @famzah'z blog about fork vs vfork vs clone performance:
https://blog.famzah.net/tag/fork-vfork-popen-clone-performance/
- A very similar idea to my afork() idea, from 2 years earlier:
https://developers.redhat.com/blog/2015/08/19/launching-helper-process-under-memory-and-latency-constraints-pthread_create-and-vfork
- misc
https://inbox.vuxu.org/tuhs/CAEoi9W6HFL3UcnWkKoqka8Dt16MWskKd6yEJr3HYCcCT9pMTig@mail.gmail.com/T/
https://bugzilla.redhat.com/show_bug.cgi?id=682922 (see attachments)kazinator
sys_64738
ridiculous_fish
userbinator
harry8
Compared to what? In what dimension? Any numbers on that? Where is the trade-off? To what extent does anyone need to care and on what circumstances?
tiffanyh
switch33
As for templates they need to be in different languages and in different formats for video games consoles, and so many other formats they port systems and games that sort of work digitally to certain things but not playable to certain things too.
The other problem is that clone is part of syscall interfaces and part of apis and part of a lot of other things too.
[deleted]
pipeline_peak
Your idea stupid
I’m not woke by any means, idk what it is about low level programming but calling someone’s idea “stupid” is a really shitty thing to say.
“He chose to take it personally” is the type of lazy, pseudo-stoic argument I have no interest in reading.
Yes I’m having a morning, lol.