#344
TF2 kills IPC, only clockrate matters.
Also why are you so salty about him getting a better results than yours?
If you had read the post properly you would've seen that it was on 640x480. He's also using a different cfg. Try running the benchmark 640x480 and you'll get a better result, although it'll be mostly due to RAM.
Did you even test -threads? Because
1. Where do you think a performance gain from more than 8 threads is supposed to come from on a CPU that can only run 8 threads concurrently?
2. If you had actually tested it you would've seen that there's no benefit beyond 2 threads.
Anyway the reason why I'm posting today is that I've looked at TF2 with VTune a bit a while ago and have given up on making a nice post with pictures. I'll just list what I've found out.
1. TF2 can indeed use at least 8 threads.
2. Only 3 of these threads actually matter. The execution time is split about 3:1:1 or 60%/20%/20% between those 3 threads. All other threads combined are usually <10%.
3. IPC is pretty bad, the threads that matter are <0.8 (>1.2 CPI). What appears to be the main game loop is properly terrible at <0.15 IPC (>6.7 CPI), but there are some that are even worse than that.
Some more details regarding 2.:
The most important thread contains the engine / main game loop. The second and third are rendering and the GPU driver. The others seem to be independent tasks that were easy to split off like audio. The overhead on those doesn't really seem to be worth it though. Back when I benchmarked -threads 2 (the GPU driver isn't counted) gave me the best results, beyond that it got steadily worse. It was within margin of error though, that's why didn't post it originally. I couldn't make sense of it at the time.
This also explains most of my previous results. The 33% or 1/3 gain in fps from enabling "multicore rendering" is in line with 1/4 of the work being split off into a seperate thread (1/0.75 = 1.33). The gain in performance when running single threaded or when adding a third core while using -threads 2 (driver + rendering together don't max out one core) can be explained with chache / branch predictor thrashing that occurs when multiple threads run on the same core (in case of single threaded game + driver).
In summary Valve didn't parallelize anything, they just went for the low hanging fruit and split off what was easy to split off. If you were able to split up the rest perfectly (those 60%) we could see 3 times the performance although you'd need at least 5 cores (more likely 6). A more realistic goal would be to parallelize just a portion of the code but even that could double the fps.
It's not too bad though. Comparing with Crysis (same year, multithreading also added later via patch) it's only slightly worse and most of that is due to Crysis also splitting off physics which aren't nearly as big of a deal in TF2.
Regarding 3.:
If I had to guess I'd blame this on Visual Studio. The functions with terrible IPC seem to be mostly bound by the microcode sequencer. In other words there's some terribly CISCy instructions in there. Crysis suffers just as much from cache and DTLB misses and branch mispredicts are even worse than in TF2 but it still gets signficantly higher IPC because it's not bound by the microcode sequencer nearly as much (0.5% vs 3%, but most of that is in the main loop which is what holds TF2 back so it's worse). There also seem to be some terrible dependency chains, which again probably wouldn't appear if they used a different compiler.
Realistically speaking just switching the compiler could improve performance by 50%.
EDIT:
A certain someone posted this on reddit so I'm copying one of my replies here because it does contain some information why I think a different compiler would help:
diegodamohillSo... I bet there's more to it, if it was simple as that these problems wouldn't exist anymore, so i see three explanations:
1 - TF2 dev team is just dumb, so they didn't noticed/knew how to fix it.
2 - The actual code is more complicated and messed than a banking software made in pascal, so it would cost too much (or even impossible) to correct everything, considering they have only a handfull of people
3 - TF2 dev team doesn't care, and we are doomed.
take your bets boys.
SetsulIt's a mix of all 3 (which one in brackets). That version of the source engine was originally single threaded, remember "multicore rendering" got added through an update later. So parallelizing anything that's not completely independent is a massive pain in the ass because the engine isn't built to deal with locks and race conditions (2).
You could probably still find some parallelism or just bite the bullet and do what is essentially rewriting parts of the engine. But anyone who knows (1) or is willing to do that (3) is probably working at Source 2. Since there were rumours of TF2 being ported and Source 2 being especially built to be able to semi-automatically port Source 1 games I'd expect their reasoning to be "It's not worth putting any effort into making multithreading in Source 1 better, just improve Source 2 and get it ready asap so all games can enjoy the benefits" (3). It does make sense and I can't blame them for it, Source 2 being Valve-time late does make it a bit awkward though.
The last thing about the compiler is similar. They are using Visual Studio if I'm not mistaken (pretty much everyone does) and there's nothing wrong with it, but the compiler just performs horribly at times. Most of the time it does a reasonably good job, so it's probably a matter of "if it ain't broken don't fix it" (3). Still when I looked at it the parts that hold performance back are not so much due to stalling but because there is almost no ILP to be found. The best example is what I think might be the main game loop with its terrible 6.7+ CPI. Sure it's stalling 50% of the time which isn't good but during the 50% that it isn't stalling >90% of the cycles only one instruction gets done. I can't help but wonder what would happen if they used the Intel compiler with Visual Studio (which is possible).
EDIT: previous posts:
http://www.teamfortress.tv/post/488391/tf2-benchmarks
http://www.teamfortress.tv/post/530699/tf2-benchmarks