[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Threads



Keith Lucas wrote:

> We're considering using threading systems to make our software a bit
> easier to write and slightly more modular, and I was wondering what
> the "state of the art" in systems design for games using threading
> looks like.
> 
> Is it used?
> 
> Or is it still all too new fangled, like C++ was a bit ago?
> 
> Are people doing coarse threading: one to do the rendering, one to do
> the AI, one to do the networking?
> 
> Or are they doing things like running a thread for every agent in the
> world to do its own AI?
> 
> I can see plusses and minuses all up and down, I'm looking for some
> datapoints...

How can I say this? Here are some facts:

 - Threads only add overhead when used on a single processor machine
 - On a multiple processor machine, threads have bad cache coherency
problems

In short, threads are bad. Check what Ingo Molnar (system performance
guru) says about threads:

| programming 'with threads' (ie.: with Linux threads that share
| page tables) is fundamentally more error-prone that coding isolated
| threads (ie. processes). This is why you see all those lazy Linux
| programmers using processes (ie. isolated threads) - if there is no
| need to share too much state, why go the error-prone path? Under
| Linux processes scale just as fine on SMP as threads.
| 
| the only area where 'all-shared-VM threads' are needed is where
| there is massive and complex interaction between threads. 98% of
| the programming tasks are not such. Additionally, on SMP systems
| threads are *fundamentally slower*, because there has to be
| (inevitable, hardware-mandated) synchronization between CPUs if
| shared VM is used.
| 
| this whole threading issue i believe comes from the fact that
| it's so hard and slow to program isolated threads (processes)
| under NT (NT processes are painfully slow to be created for
| example) - so all programming tasks which are performance-sensitive
| are forced to use all-shared-VM threads. Then this technological
| disadvantage of NT is spinned into a magical 'using threads is
| better' mantra. IMHO it's a fundamentally bad (and rude) thing
| to force some stupid all-shared-VM concept on all multi-context
| programming tasks.
| 
| for example, the submitted SPECweb99 TUX results were done in a
| setup where every CPU was running an isolated thread. Windows 2000
| will never be able to do stuff like this without redesigning their
| whole OS, because processes are just so much fscked up there, and
| all the APIs (and programming tools) have this stupid bias towards
| all-shared-VM threads.

John Carmack tried using threads in Quake 3 to support multiple CPU
systems, with one thread running the game, and the other spewing the
OpenGL primitives (thus having the OpenGL driver chew up time on the
other CPU). More than one thread per CPU turned out to be too much
overhead, and it took a LOT of work to make the two-thread version work
correctly. When one thread touched some data that the other thread also
used, CPUs were constantly flushing caches (BAD!) and exchanging
coherency information (slows down the CPU). Net result: the threaded
version was SLOWER than the non-threaded version.

With a lot of work, he finally managed to squeeze an improvement out of
this, but not as much as he thought at the beginning. Do you think you
are a better coder than Carmack? ;-)

FYI, my day job is being a systems analyst for a vector supercomputer
company (NEC SX-4 and SX-5 machines with 16 and 32 vector processors and
up to 128 gigs of RAM). You know how NEC made the equivalent of threads
fast on their machines? The CPUs have NO CACHE. That's right! They just
made the memory so fast that cache wouldn't be needed, because there is
NO WAY to have fast threads (supercomputer work is usually in the 2%
that Ingo is talking about) on an multiprocessor machine with caching,
because of coherency problems.

How fast is the memory on a NEC supercomputer? well, the clock is a bit
over 200 MHz and applies to memory, but the memory bus is 256 bytes
wide, compared to the 64 bits wide memory bus on PCs. Do the
calculation, it's a freakin' large bandwidth. :-)

-- 
/* you are not expected to understand this */
 -- from the UNIX V6 kernel source

---------------------------------------------------------------------
To unsubscribe, e-mail: linuxgames-unsubscribe@sunsite.auc.dk
For additional commands, e-mail: linuxgames-help@sunsite.auc.dk