mastercomfig - fps/customization config

#91

mastercoms

4 Frags – +

SetsulIt doesn't have to be system specific.

Sure, but it will more optimal if it was system specific. However, I like your idea of using a default value for min heap size of 196, since that's still larger but not too large.

SetsulNo, it won't fade any faster. With -5000 it fades, still being rendered until 5000 except with alpha involved, with 0 it'll be fully opaque and just disappear at 5000.

Sorry if my wording was confusing. I meant I wanted it to fade since it will be basically invisible at like 4000 or something, which reduces clutter. With 5000 to 5000, it will still be opaque at 4000, and disappear at 5000.

SetsulYou didn't explain why otherwise the colour buffer would be read and why it would not be read when set to a solid colour.

The driver will see if you call a clear on the color buffer for that frame, and if so, then it won't read what you last drew to the buffer again.

SetsulYou didn't explain how writing to the colour buffer instead of reading from it would speed things up.

On modern hardware, clearing the color buffer is faster than the read. Sure, it would be most ideal to tell the driver to just invalidate the last buffer, but as far as I can tell, the Source Engine wasn't written for that.

SetsulYou didn't explain why that would happen on TBIMR like Maxwell/Pascal but not on standard IMR.

Because in order to stitch tiles together efficiently, the GPU has to have a different understanding of the final scene for the next frame. It doesn't know if the draw this upcoming frame will draw on every pixel, so it preemptively reads from the last buffer.

[quote=Setsul]
It doesn't have to be system specific.
[/quote]
Sure, but it will more optimal if it was system specific. However, I like your idea of using a default value for min heap size of 196, since that's still larger but not too large.

[quote=Setsul]
No, it won't fade any faster. With -5000 it fades, still being rendered until 5000 except with alpha involved, with 0 it'll be fully opaque and just disappear at 5000.
[/quote]
Sorry if my wording was confusing. I meant I wanted it to fade since it will be basically invisible at like 4000 or something, which reduces clutter. With 5000 to 5000, it will still be opaque at 4000, and disappear at 5000.

[quote=Setsul]
You didn't explain why otherwise the colour buffer would be read and why it would not be read when set to a solid colour.
[/quote]
The driver will see if you call a clear on the color buffer for that frame, and if so, then it won't read what you last drew to the buffer again.

[quote=Setsul]
You didn't explain how writing to the colour buffer instead of reading from it would speed things up.
[/quote]
On modern hardware, clearing the color buffer is faster than the read. Sure, it would be most ideal to tell the driver to just invalidate the last buffer, but as far as I can tell, the Source Engine wasn't written for that.

[quote=Setsul]
You didn't explain why that would happen on TBIMR like Maxwell/Pascal but not on standard IMR.
[/quote]
Because in order to stitch tiles together efficiently, the GPU has to have a different understanding of the final scene for the next frame. It doesn't know if the draw this upcoming frame will draw on every pixel, so it preemptively reads from the last buffer.

#92

ZeRo5

4 Frags – +

btw why is tf_time_loading_item_panels set to 0 it doesn't save that much fps as far as I know and it makes switching weapons nearly impossible

#93

mastercoms

3 Frags – +

ZeRo5btw why is tf_time_loading_item_panels set to 0 it doesn't save that much fps as far as I know and it makes switching weapons nearly impossible

It is set to 0 for maximum frames, though I agree it makes things inconvenient. I'll try a lower value than in the main config, but something that will still allow for loading item panels. I've just found it decreases FPS when you're spectating someone.

EDIT: updated maxframes config with 0.00007, but didn't have time to test it.

[quote=ZeRo5]btw why is tf_time_loading_item_panels set to 0 it doesn't save that much fps as far as I know and it makes switching weapons nearly impossible[/quote]

It is set to 0 for maximum frames, though I agree it makes things inconvenient. I'll try a lower value than in the main config, but something that will still allow for loading item panels. I've just found it decreases FPS when you're spectating someone.

EDIT: updated maxframes config with 0.00007, but didn't have time to test it.

#94

Setsul

-8 Frags – +

That's literally why these settings exist.

It must still initialize the buffer with that colour or the colour won't show up if it's not overdrawn. Otherwise gl_clear_rancomcolor wouldn't work. There are commands to invalidate the framebuffer, glclear is not one of them.
Event then assuming it would work like you said, why would it only speed up tiled renderers?
Tiles aren't sitched together either. A pixel either is in one tile, or it isn't. It can't be in 2 tiles either.
A GPU will never know how the scene looks before rendering it. It has to start drawing before all draw calls are finished. These are not deferred renderers, they are immediate mode renderers. Instead of the usual IMR MO "get drawcall, rasterize whole triangle, repeat" a TBIMR buffers a bit of geometry, then rasterizes a tile (actually multiple in parallel), ignoring whichever parts of a triangle (or even a whole one) are not inside this tile, then moves on to the next tile. It doesn't know how many triangles there will be and where they will be and it doesn't care. It doesn't have to stitch tiles together.

That's literally why these settings exist.

It must still initialize the buffer with that colour or the colour won't show up if it's not overdrawn. Otherwise gl_clear_rancomcolor wouldn't work. There are commands to invalidate the framebuffer, glclear is not one of them.
Event then assuming it would work like you said, why would it only speed up tiled renderers?
Tiles aren't sitched together either. A pixel either is in one tile, or it isn't. It can't be in 2 tiles either.
A GPU will never know how the scene looks before rendering it. It has to start drawing before all draw calls are finished. These are not deferred renderers, they are immediate mode renderers. Instead of the usual IMR MO "get drawcall, rasterize whole triangle, repeat" a TBIMR buffers a bit of geometry, then rasterizes a tile (actually multiple in parallel), ignoring whichever parts of a triangle (or even a whole one) are not inside this tile, then moves on to the next tile. It doesn't know how many triangles there will be and where they will be and it doesn't care. It doesn't have to stitch tiles together.

#95

Pert

-9 Frags – +

its like clockwork (heh).. every few months someone comes out with a new config of their own and every single time it ends up being worse than Comanglia's..

#96

mastercoms

9 Frags – +

SetsulIt must still initialize the buffer with that colour or the colour won't show up if it's not overdrawn.

Initialize a buffer? You can only clear present buffers. I think you're misunderstanding what buffer clearing does.

SetsulThere are commands to invalidate the framebuffer, glclear is not one of them.

AFAIK, there is no Source cvar that makes it invalidate the buffer per frame (with some instruction like glInvalidateFramebuffer, don't know what the DirectX equivalent is).

SetsulA GPU will never know how the scene looks before rendering it.

Never said that. What I was saying is that GPU drivers will expect certain things with the information they have. And what you said is exactly why tiled renderers have to do this. The GPU won't know if you're going to be drawing on every pixel, nor what tiles you're going to cover. Now, I'm not a graphics driver programmer, so I'm not sure exactly about the inner workings of how/why they expect a certain different behavior on tiled rendering, but I know for a fact that the driver does act this way, and that's all I need to know for this config.

Pertits like clockwork (heh).. every few months someone comes out with a new config of their own and every single time it ends up being worse than Comanglia's..

How is it worse than Comanglia's? It gets better FPS as shown in the maxframes benchmarks.

[quote=Setsul]
It must still initialize the buffer with that colour or the colour won't show up if it's not overdrawn. [/quote]
Initialize a buffer? You can only clear present buffers. I think you're misunderstanding what buffer clearing does.

[quote=Setsul]
There are commands to invalidate the framebuffer, glclear is not one of them.
[/quote]
AFAIK, there is no Source cvar that makes it invalidate the buffer per frame (with some instruction like glInvalidateFramebuffer, don't know what the DirectX equivalent is).

[quote=Setsul]
A GPU will never know how the scene looks before rendering it.
[/quote]
Never said that. What I was saying is that GPU drivers will expect certain things with the information they have. And what you said is exactly why tiled renderers have to do this. The GPU won't know if you're going to be drawing on every pixel, nor what tiles you're going to cover. Now, I'm not a graphics driver programmer, so I'm not sure exactly about the inner workings of how/why they expect a certain different behavior on tiled rendering, but I know for a fact that the driver does act this way, and that's all I need to know for this config.

[quote=Pert]
its like clockwork (heh).. every few months someone comes out with a new config of their own and every single time it ends up being worse than Comanglia's..
[/quote]
How is it worse than Comanglia's? It gets better FPS as shown in the maxframes benchmarks.

#97

Thole

4 Frags – +

wait so this config has better graphics, but similar fps to comanglias? would it run even faster if you gave it the shittiest graphics possible on top of all the cvars? i won't be home to my computer for a long time, otherwise i would have tried that myself right away, but i am really interested in this config from what I've seen in the thread.

#98

Setsul

-2 Frags – +

It doesn't matter what you want to call it. To set the colour in the colour buffer you have to write to it. Quoting the official documentation from khronos:

glClear sets the bitplane area of the window to values previously selected by glClearColor, glClearIndex, glClearDepth, glClearStencil, and glClearAccum.

Yes, there is no cvar for it because you don't need to do it. gl_clear doesn't do it either.

Please just give me a source for this. Because either you misunderstood completely or can't explain it.
Everything you said so far about the behaviour of tiled renderers and gl_clear is wrong.
Yes, if you invalidate a buffer you don't have to read it.
No, glclear doesn't do it.
No, this doesn't affect tiled renderers differently. Any renderer has to read (read-modify-write actually) valid buffers and can ignore invalid ones.
No, clearing or even invalidating a buffer will not tell you where there will be triangles in the next frame.
No, you can't skip tiles just because you invalidated the buffer.
The whole point of rasterisation is to figure out where the triangles are. Before that you don't know it. You can't know it. And you don't need to know it. Clearing the buffer adds no information. If you write into an existing buffer you write to it. If you write into a new buffer you don't have to pull in cache lines for rmw because everything you don't write will automatically be zero. That's it.
So please, show me a link.

It doesn't matter what you want to call it. To set the colour in the colour buffer you have to write to it. Quoting the official documentation from khronos:
[quote]glClear sets the bitplane area of the window to values previously selected by glClearColor, glClearIndex, glClearDepth, glClearStencil, and glClearAccum.[/quote]

Yes, there is no cvar for it because you don't need to do it. gl_clear doesn't do it either.

Please just give me a source for this. Because either you misunderstood completely or can't explain it.
Everything you said so far about the behaviour of tiled renderers and gl_clear is wrong.
Yes, if you invalidate a buffer you don't have to read it.
No, glclear doesn't do it.
No, this doesn't affect tiled renderers differently. Any renderer has to read (read-modify-write actually) valid buffers and can ignore invalid ones.
No, clearing or even invalidating a buffer will not tell you where there will be triangles in the next frame.
No, you can't skip tiles just because you invalidated the buffer.
The whole point of rasterisation is to figure out where the triangles are. Before that you don't know it. You can't know it. And you don't need to know it. Clearing the buffer adds no information. If you write into an existing buffer you write to it. If you write into a new buffer you don't have to pull in cache lines for rmw because everything you don't write will automatically be zero. That's it.
So please, show me a link.

#99

mastercoms

4 Frags – +

Tholewait so this config has better graphics, but similar fps to comanglias? would it run even faster if you gave it the shittiest graphics possible on top of all the cvars? i won't be home to my computer for a long time, otherwise i would have tried that myself right away, but i am really interested in this config from what I've seen in the thread.

It has a maxframes addon which gives similar poor visual quality like Comanglia's and better FPS. Though I think it looks better than Comanglia's even with the maxframes addon.

Setsul-snip-

Sorry, but what you just refuted was not at all what I said. You're skipping crucial information, one of the most important is that the driver will preemptively load the previous buffer because it doesn't know if your next draw is going to cover every pixel. I never said the driver knows where things will be rendered, I am saying because it doesn't know, it does this. Also, you seem to be confused where the clear takes place. You're clearing for the next frame, to prevent the driver from reading the buffer from the previous one. The write that clearing does is faster than reading the previous frame on the modern hardware.

[quote=Thole]wait so this config has better graphics, but similar fps to comanglias? would it run even faster if you gave it the shittiest graphics possible on top of all the cvars? i won't be home to my computer for a long time, otherwise i would have tried that myself right away, but i am really interested in this config from what I've seen in the thread.[/quote]

It has a maxframes addon which gives similar poor visual quality like Comanglia's and better FPS. Though I think it looks better than Comanglia's even with the maxframes addon.

[quote=Setsul]
-snip-
[/quote]
Sorry, but what you just refuted was not at all what I said. You're skipping crucial information, one of the most important is that the driver will preemptively load the previous buffer because it doesn't know if your next draw is going to cover every pixel. I never said the driver knows where things will be rendered, I am saying because it doesn't know, it does this. Also, you seem to be confused where the clear takes place. You're clearing for the next frame, to prevent the driver from reading the buffer from the previous one. The write that clearing does is faster than reading the previous frame on the modern hardware.

#100

Setsul

-8 Frags – +

I have no idea you think this works.

You always draw over the previous buffer unless it's invalidated. glclear does not invalidate buffers.
The driver will preemptively load the previous buffer to where?
Where do you think the solid red or whatever you set for glclear comes from? Every pixel that won't be overwritten needs to have that value.
And still most importantly, what would change if you do tiled rendering?

Really, just link where you got this from.

I have no idea you think this works.

You always draw over the previous buffer unless it's invalidated. glclear does not invalidate buffers.
The driver will preemptively load the previous buffer to where?
Where do you think the solid red or whatever you set for glclear comes from? Every pixel that won't be overwritten needs to have that value.
And still most importantly, what would change if you do tiled rendering?

Really, just link where you got this from.

#101

mastercoms

10 Frags – +

Setsul

I'm trying to find the link, but you are obviously misunderstanding what I'm saying.

Let me give you a simple example before I find the link. Let's say you are rendering a simple UI. That UI has a few buttons, and something just triggered a hover effect on one of the buttons. So you would update that part of the UI. The GPU has to read the previous buffer to draw the rest of the UI that you didn't update.

But lets say your draw is covering every pixel. The GPU doesn't know that this draw will cover every pixel, so it will still read the previous buffer, assuming that you won't update every pixel and expect what you drew last to stay there. If you call glClear, the GPU knows that it won't have to read the previous buffer because you've cleared the color buffer. This writing to the color buffer will be faster than the read.

[quote=Setsul]
[/quote]
I'm trying to find the link, but you are obviously misunderstanding what I'm saying.

Let me give you a simple example before I find the link. Let's say you are rendering a simple UI. That UI has a few buttons, and something just triggered a hover effect on one of the buttons. So you would update that part of the UI. The GPU has to read the previous buffer to draw the rest of the UI that you didn't update.

But lets say your draw is covering every pixel. The GPU doesn't know that this draw will cover every pixel, so it will still read the previous buffer, assuming that you won't update every pixel and expect what you drew last to stay there. If you call glClear, the GPU knows that it won't have to read the previous buffer because you've cleared the color buffer. This writing to the color buffer will be faster than the read.

#102

Setsul

-1 Frags – +

It still does nothing for the stencil and depth buffer.
You still have to do the write. Why would a forced write be faster than rdm on demand?
And we're talking about micro seconds that it takes to read the buffer.
And you still haven't named a single reason why this would affect tiled renderers differently.

#103

mastercoms

4 Frags – +

Stencil and depth buffer don't need to be dealt with for the driver to understand that it can skip reading buffer.
It is faster and does this only on tiled renderers because the write is faster on GPUs with tiled rendering.

#104

Setsul

-6 Frags – +

Well it means you can only skip the color buffer.
Anyway, why is the write faster on GPUs with tiled rendering.
I think I finally figured out what you mean (which could've been explained in one sentence), but considering how easy it is to explain and to realize why it doesn't work on Maxwell/Pascal I'm going to wait for a link to make sure you didn't just keep randomly guessing.

#105

mastercoms

11 Frags – +

I've been saying the same thing over and over, just in different ways so you could understand.

Drivers skip reading all buffers if you use glClear. And it might take a bit to find the link, I'm trying to figure what keywords to use. It might also be on ASM or IEEE so I'll have to look there too if I don't find it from googling it.

I've been saying the same thing over and over, just in different ways so you could understand.

Drivers skip reading all buffers if you use glClear. And it might take a bit to find the link, I'm trying to figure what keywords to use. It might also be on ASM or IEEE so I'll have to look there too if I don't find it from googling it.

#106

Setsul

-3 Frags – +

No, only the colour buffer. Can't skip the others unless GL_DEPTH_BUFFER_BIT, GL_ACCUM_BUFFER_BIT, and GL_STENCIL_BUFFER_BIT are all set.

Anyway how hard can it be to remember a single word?
I've had to guess so much since apparently it's impossible. That's why I asked

SetsulThe driver will preemptively load the previous buffer to where?

Do you mean the tile cache?
Because mobile/low power/low bandwidth tiled renderers have on chip buffers for colour and depth so writing to them is faster than reading from memory, because you share that with the CPU cores and can count yourself lucky if you get double digits GB/s bandwidth.

Makes perfect sense, the "on modern hardware it's faster" confused me because this has been a thing for 20 years.

It also explains why it only affects tiled renderers.

Except Maxwell/Pascal have neither that bandwidth problem nor a tile cache. They are vastly different from e.g. ARM Mali or even TBDR like PowerVR.
Also even just googling "tiled renderer glclear" gets you a source https://community.arm.com/graphics/b/blog/posts/mali-performance-2-how-to-correctly-handle-framebuffers at the top of the page.
So tell me:
Is this what you meant?
Why do you not know the term "tile cache"?
How basic was your research that you didn't realise that Maxwell/Pascal are vastly different than mobile low power GPUs?

No, only the colour buffer. Can't skip the others unless GL_DEPTH_BUFFER_BIT, GL_ACCUM_BUFFER_BIT, and GL_STENCIL_BUFFER_BIT are all set.

Anyway how hard can it be to remember a single word?
I've had to guess so much since apparently it's impossible. That's why I asked
[quote=Setsul]The driver will preemptively load the previous buffer to where?[/quote]
Do you mean the tile cache?
Because mobile/low power/low bandwidth tiled renderers have on chip buffers for colour and depth so writing to them is faster than reading from memory, because you share that with the CPU cores and can count yourself lucky if you get double digits GB/s bandwidth.

Makes perfect sense, the "on modern hardware it's faster" confused me because this has been a thing for 20 years.

It also explains why it only affects tiled renderers.

Except Maxwell/Pascal have neither that bandwidth problem nor a tile cache. They are vastly different from e.g. ARM Mali or even TBDR like PowerVR.
Also even just googling "tiled renderer glclear" gets you a source https://community.arm.com/graphics/b/blog/posts/mali-performance-2-how-to-correctly-handle-framebuffers at the top of the page.
So tell me:
Is this what you meant?
Why do you not know the term "tile cache"?
How basic was your research that you didn't realise that Maxwell/Pascal are vastly different than mobile low power GPUs?

#107

toads_tf

28 Frags – +

what in the fuck is this thread

#108

JohhnyFromCali

-5 Frags – +

Someone post I know some of these words gif!

#109

JackStanley

8 Frags – +

This guy did the great job, give it a shot!

#110

Menachem

17 Frags – +

i keep going back over this thread in a struggle to understand why this turned so hostile

this is the drama tf2 really needs

i keep going back over this thread in a struggle to understand why this turned so hostile

this is the drama tf2 really needs

#111

mastercoms

17 Frags – +

Setsul, this isn't going to be worth my time if you aren't willing to understand what I say, nor if you are going to discuss these things in a non-pretentious manner. But, that's not what I meant. Here's the source I meant: http://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-TileBasedArchitectures.pdf (23.3 on page 5 or 326 and 23.4 on the next page)

Menachemi keep going back over this thread in a struggle to understand why this turned so hostile

this is the drama tf2 really needs

I've tried to remain as friendly as possible but Setsul's behavior and discussion habits makes it quite hard.

JackStanleyThis guy did the great job, give it a shot!

Thanks! I'm not a guy though :(

Setsul, this isn't going to be worth my time if you aren't willing to understand what I say, nor if you are going to discuss these things in a non-pretentious manner. But, that's not what I meant. Here's the source I meant: http://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-TileBasedArchitectures.pdf (23.3 on page 5 or 326 and 23.4 on the next page)

[quote=Menachem]i keep going back over this thread in a struggle to understand why this turned so hostile

this is the drama tf2 really needs[/quote]

I've tried to remain as friendly as possible but Setsul's behavior and discussion habits makes it quite hard.

[quote=JackStanley]This guy did the great job, give it a shot![/quote]
Thanks! I'm not a guy though :(

#112

Setsul

2 Frags – +

Why do you still ignore everything I say?

Maxwell/Pascal are tiled immediate mode renderers.
They don't have tile caches.
It doesn't apply.

I know the terminology is confusing, but that's how it is. For many years all tiled renderers have been deferred renderers.*
Maxwell and Pascal aren't like this. They are fundamentally different.

To quote your own source:

On immediate-mode GPUs, blending is usually expensive because it requires a read-modify-write cycle to the framebuffer, which is held in relatively slow memory. On a tile-based CPU, this read-modify-write cycle occurs entirely on-chip and so is very cheap.Tile-based GPUs are sometimes referred to as deferred because the driver will try to avoid performing fragment shading until it is required

And now this http://www.realworldtech.com/tile-based-rasterization-nvidia-gpus/

Maxwell and Pascal use tile-based immediate-mode rasterizers

Do you see the problem?

*Mali is actually an immediate mode renderer, but the basic structure is still the same, just the order with the depth pass and HSR are different. I know this makes it even more confusing, but that's how it is.

Why do you still ignore everything I say?

Maxwell/Pascal are tiled immediate mode renderers.
They don't have tile caches.
It doesn't apply.

I know the terminology is confusing, but that's how it is. For many years all tiled renderers have been deferred renderers.*
Maxwell and Pascal aren't like this. They are fundamentally different.

To quote your own source:
[quote]On immediate-mode GPUs, blending is usually expensive because it requires a read-modify-write cycle to the framebuffer, which is held in relatively slow memory. On a tile-based CPU, this read-modify-write cycle occurs entirely on-chip and so is very cheap.[/quote]
[quote]Tile-based GPUs are sometimes referred to as deferred because the driver will try to avoid performing fragment shading until it is required[/quote]
And now this http://www.realworldtech.com/tile-based-rasterization-nvidia-gpus/
[quote]Maxwell and Pascal use tile-based immediate-mode rasterizers[/quote]

Do you see the problem?

*Mali is actually an immediate mode renderer, but the basic structure is still the same, just the order with the depth pass and HSR are different. I know this makes it even more confusing, but that's how it is.

#113

Mould

43 Frags – +

Config mains smh

#114

mastercoms

11 Frags – +

Setsul

Oh ok, sorry. I see. You're right then. I'd like to thank you for walking me through it. I was missing the deferred vs immediate mode bit and why it would matter.

Could you explain this too? https://stackoverflow.com/questions/37335281/is-glcleargl-color-buffer-bit-preferred-before-a-whole-frame-buffer-overwritte/37336947#37336947

MouldConfig mains smh

Excuse you, I have a Hale's Own Text Editor and I'm proud!

[quote=Setsul][/quote]
Oh ok, sorry. I see. You're right then. I'd like to thank you for walking me through it. I was missing the deferred vs immediate mode bit and why it would matter.

Could you explain this too? https://stackoverflow.com/questions/37335281/is-glcleargl-color-buffer-bit-preferred-before-a-whole-frame-buffer-overwritte/37336947#37336947

[quote=Mould]Config mains smh[/quote]
Excuse you, I have a Hale's Own Text Editor and I'm proud!

#115

Setsul

4 Frags – +

Ok, finally we got that sorted out.
I was in way too deep on the tiled vs deferred thing* (which has been going on for over a decade now) to realize that most are still used to tiled = deferred and the books on the subjects, which are often quite old, couldn't even know that that has changed. I guess that arguing in circles was incredibly frustrating for both of us.

So what exactly should I explain, there's still multiple things left.
-Why gl_clear 1 in TF2 isn't really a good idea without going really deep into the rendering code to check first?
-Why glclear doesn't really affect Maxwell/Pascall differently than standard desktop IMR GPUs?
-Why glclear generally affects desktop GPUs less?
-or the thing that's explained in that stackoverflow post Why GPUs with tile caches benefit from glclear?

I'll probably only get around to answering it tomorrow.

*further complicated by ARM Mali not being deferred but still using tile caches and then lead completely ad absurdum by nVidia doing tiling without tile caches and calling it "tiled caching".
https://www.techpowerup.com/img/17-03-01/f34e39b49c7c.jpg

#113
You know exactly that only this level of stubborness, persistence and "attention to detail"/autism like focus gets you good configs.

Ok, finally we got that sorted out.
I was in way too deep on the tiled vs deferred thing* (which has been going on for over a decade now) to realize that most are still used to tiled = deferred and the books on the subjects, which are often quite old, couldn't even know that that has changed. I guess that arguing in circles was incredibly frustrating for both of us.

So what exactly should I explain, there's still multiple things left.
-Why gl_clear 1 in TF2 isn't really a good idea without going really deep into the rendering code to check first?
-Why glclear doesn't really affect Maxwell/Pascall differently than standard desktop IMR GPUs?
-Why glclear generally affects desktop GPUs less?
-or the thing that's explained in that stackoverflow post Why GPUs with tile caches benefit from glclear?

I'll probably only get around to answering it tomorrow.

*further complicated by ARM Mali not being deferred but still using tile caches and then lead completely ad absurdum by nVidia doing tiling without tile caches and calling it "tiled caching".
https://www.techpowerup.com/img/17-03-01/f34e39b49c7c.jpg

#113
You know exactly that only this level of stubborness, persistence and "attention to detail"/autism like focus gets you good configs.

#116

mastercoms

6 Frags – +

Yeah, it was frustrating, but I'm glad it was sorted out. Sorry about that whole social drama thing in #111, I was just kind of frustrated from our conversation.

Anyway, the thing I was asking was the StackOverflow post's claim about fast clears, saying that glClear actually saves write and read time, on any GPU, not just Maxwell and Pascal.

Yeah, it was frustrating, but I'm glad it was sorted out. Sorry about that whole social drama thing in #111, I was just kind of frustrated from our conversation.

Anyway, the thing I was asking was the StackOverflow post's claim about fast clears, saying that glClear actually saves write and read time, on any GPU, not just Maxwell and Pascal.

#117

osvaldo

11 Frags – +

Ultimate mastercoms' max fps confug featuring big Setsul when ?

#118

Setsul

9 Frags – +

In an ideal world where you know and control everything yes.
In the real world you either need tile caches (for which it is basically a requirement) or there will be a lot of ifs.
So with tile caches you basically have to do it because if you just assume everything will be zero and only start reading once it's clear that a part of the tile will be overwritten you will end up wasting a lot of time. The driver assumes everyone knows what they are doing so no glclear means you think you'll need the previous buffer and it will read it.
Also when you have maybe 10 GB/s then for 60 fps just a single full read on a 1920x1080 framebuffer eats up 5% of your total read budget. When you have 200-500 GB/s you can afford to care a lot less.
With an IMR things are different so if you don't write anything nothing gets pulled into the cache at all. Of course if you overwrite everything then in theory not having to read anything does save time. But then the ifs start.

1. If you don't overwrite everything it obviously doesn't work or you have to redraw something. You might not save time/bandwidth anymore. In case of TF2 I don't think they do weird stuff with not overdrawing the HUD (and no one cares about the main menu) so we should be safe.
2. glclear can clear multiple buffers. For example clearing only the stencil or the depth buffer makes things worse because they are usually interleaved and you can't pull in partial cache lines so you end up having to read everything anyway and do a write on top of it. Again we're safe because gl_clear only sets the flag for the colour buffer and that should be seperate.
3. glclear doesn't clear to zero. It clears to whatever GL_ACCUM/COLOR/DEPTH/INDEX/STENCIL_CLEAR_VALUE is set. You can still invalidate the backing memory, but you now actually have to do a write and you can't write into nothing so you need to make room for the cache lines you write to. If the cache lines that you have to evict for that have been written to then you need to write them out to memory. That's not any faster than reading from memory and you still haven't done the on-chip write to the colour buffer. Even if you can just throw away the cache lines because they were unchanged you might need them for rendering again (there is a reason they were in the cache) so now you have to read them from memory again. There's all sorts of "smart" things the driver could be doing but I wouldn't rely on undocumented behaviour. Benchmarking it is easier and faster. gl_clear_randomcolor is obviously a debug command and clears to solid colours, but I haven't cared about source OpenGL code enough (sorry Linux fanatics, it's just not that important) to check what gl_clear will clear to. Easier to just benchmark.
4. Contrary to popular belief Valve is not run by idiots. "Smart" clearing isn't all that new, it's been the standard for a few years by the time TF2 was released. When the started "porting" source to OpenGL it was already impossible to find a GPU that would still do writes for a clear to zero. So they might already be clearing to black/zero by default and gl_clear 1 sets it to a certain colour instead. Or they might be clearing everytime it's a clear win already whereas gl_clear 1 alway forces a glclear even in for desktop GPUs disadvantageous situations to have that option for GPUs with tile caches, where it would still be a win unless there's a ridiculous amount of redraw needed.
Again probably easier to just benchmark it.
Except due to the higher bandwidth on desktop GPUs you might not be able to tell which setting is better because the difference should be smaller than the variation between runs.
I mean try gl_clear_randomcolor and see how much it changes.

#117
At this point it's a definite maybe.

In an ideal world where you know and control everything yes.
In the real world you either need tile caches (for which it is basically a requirement) or there will be a lot of ifs.
So with tile caches you basically have to do it because if you just assume everything will be zero and only start reading once it's clear that a part of the tile will be overwritten you will end up wasting a lot of time. The driver assumes everyone knows what they are doing so no glclear means you think you'll need the previous buffer and it will read it.
Also when you have maybe 10 GB/s then for 60 fps just a single full read on a 1920x1080 framebuffer eats up 5% of your total read budget. When you have 200-500 GB/s you can afford to care a lot less.
With an IMR things are different so if you don't write anything nothing gets pulled into the cache at all. Of course if you overwrite everything then in theory not having to read anything does save time. But then the ifs start.

1. If you don't overwrite everything it obviously doesn't work or you have to redraw something. You might not save time/bandwidth anymore. In case of TF2 I don't think they do weird stuff with not overdrawing the HUD (and no one cares about the main menu) so we should be safe.
2. glclear can clear multiple buffers. For example clearing only the stencil or the depth buffer makes things worse because they are usually interleaved and you can't pull in partial cache lines so you end up having to read everything anyway and do a write on top of it. Again we're safe because gl_clear only sets the flag for the colour buffer and that should be seperate.
3. glclear doesn't clear to zero. It clears to whatever GL_ACCUM/COLOR/DEPTH/INDEX/STENCIL_CLEAR_VALUE is set. You can still invalidate the backing memory, but you now actually have to do a write and you can't write into nothing so you need to make room for the cache lines you write to. If the cache lines that you have to evict for that have been written to then you need to write them out to memory. That's not any faster than reading from memory and you still haven't done the on-chip write to the colour buffer. Even if you can just throw away the cache lines because they were unchanged you might need them for rendering again (there is a reason they were in the cache) so now you have to read them from memory again. There's all sorts of "smart" things the driver could be doing but I wouldn't rely on undocumented behaviour. Benchmarking it is easier and faster. gl_clear_randomcolor is obviously a debug command and clears to solid colours, but I haven't cared about source OpenGL code enough (sorry Linux fanatics, it's just not that important) to check what gl_clear will clear to. Easier to just benchmark.
4. Contrary to popular belief Valve is not run by idiots. "Smart" clearing isn't all that new, it's been the standard for a few years by the time TF2 was released. When the started "porting" source to OpenGL it was already impossible to find a GPU that would still do writes for a clear to zero. So they might already be clearing to black/zero by default and gl_clear 1 sets it to a certain colour instead. Or they might be clearing everytime it's a clear win already whereas gl_clear 1 alway forces a glclear even in for desktop GPUs disadvantageous situations to have that option for GPUs with tile caches, where it would still be a win unless there's a ridiculous amount of redraw needed.
Again probably easier to just benchmark it.
Except due to the higher bandwidth on desktop GPUs you might not be able to tell which setting is better because the difference should be smaller than the variation between runs.
I mean try gl_clear_randomcolor and see how much it changes.

#117
At this point it's a definite maybe.

#119

JackStanley

0 Frags – +

So I ran myself a benchmark and it seems there are almost no FPS difference between Comanglia's and Mastercoms' (stabillity/maxframes) configs for me. Keep in mind I had my Opera browser opened with power save mode + I use LOD Tweak, dxlevel 80 + removed gibs, ragdolls and ect. for both of them.
In-game mods, Specs

Benchmark results:

Show Content

2639 frames 22.568 seconds 116.94 fps ( 8.55 ms/f) 8.382 fps variability
comanglia

2639 frames 22.860 seconds 115.44 fps ( 8.66 ms/f) 8.489 fps variability
masterconfig

So I ran myself a benchmark and it seems there are almost no FPS difference between Comanglia's and Mastercoms' (stabillity/maxframes) configs for me. Keep in mind I had my Opera browser opened with power save mode + I use LOD Tweak, dxlevel 80 + removed gibs, ragdolls and ect. for both of them.
[url=http://imgur.com/a/pAk9b]In-game mods[/url], [url=https://pastebin.com/xHdJVTKL]Specs[/url]

Benchmark results:
[spoiler]2639 frames 22.568 seconds 116.94 fps ( 8.55 ms/f) 8.382 fps variability
comanglia

2639 frames 22.860 seconds 115.44 fps ( 8.66 ms/f) 8.489 fps variability
masterconfig
[/spoiler]

#120

mastercoms

0 Frags – +

JackStanleySo I ran myself a benchmark and it seems there are almost no FPS difference between Comanglia's and Mastercoms' (stabillity/maxframes) configs for me. Keep in mind I had my Opera browser opened with power save mode + I use LOD Tweak, dxlevel 80 + removed gibs, ragdolls and ect. for both of them.

How did you remove gibs and ragdolls?

What is LOD tweak?

And I recommend using the highest dxlevel supported by your card. But it comes down to what the benchmarks say, and personal preference. There are some things in dx8 that hurt performance depending on your specs. It makes up for it by disabling some effects not supported by it, but I think most of the important effects can be disabled in dx9 anyway.

Setsul

Hm, so I guess it comes down to benchmarking it with some perf tool.

[quote=JackStanley]So I ran myself a benchmark and it seems there are almost no FPS difference between Comanglia's and Mastercoms' (stabillity/maxframes) configs for me. Keep in mind I had my Opera browser opened with power save mode + I use LOD Tweak, dxlevel 80 + removed gibs, ragdolls and ect. for both of them.
[/quote]

How did you remove gibs and ragdolls?

What is LOD tweak?

And I recommend using the highest dxlevel supported by your card. But it comes down to what the benchmarks say, and personal preference. There are some things in dx8 that hurt performance depending on your specs. It makes up for it by disabling some effects not supported by it, but I think most of the important effects can be disabled in dx9 anyway.

[quote=Setsul]
[/quote]
Hm, so I guess it comes down to benchmarking it with some perf tool.