Upvote Upvoted 10 Downvote Downvoted
1 2
Evaluating good play from stats/logs
31
#31
2 Frags +
HellbentIt is kind of worrisome that teams look for people with high damage outputs or good k/d's to consider them for teams now. How should you play in your tryout? Try to get the most damage out by baiting/dive bombing at retarded times, or play a smarter game?

If K/D and DPM aren't as accurate, can you think of any other stats that would indicate "a smarter game"?

[quote=Hellbent]It is kind of worrisome that teams look for people with high damage outputs or good k/d's to consider them for teams now. How should you play in your tryout? Try to get the most damage out by baiting/dive bombing at retarded times, or play a smarter game?[/quote]

If K/D and DPM aren't as accurate, can you think of any other stats that would indicate "a smarter game"?
32
#32
4 Frags +

This whole topic is really interesting. Dunno if anyone follows the NHL much but there's a very similar debate going on there about the merits of advanced stats vs traditional status or the "eye test". A lot of the pros/cons going back and forth are very similar.

I don't think stats are going to tell you everything, but they don't have to do so to be useful. All they have to do is show trends, and give you more information to work with - and if you're open to interpreting, it can be very useful.

"You can’t disagree with the numbers, you can only disagree with the conclusion.” - Stan Bowman

This whole topic is really interesting. Dunno if anyone follows the NHL much but there's a very similar debate going on there about the merits of advanced stats vs traditional status or the "eye test". A lot of the pros/cons going back and forth are very similar.

I don't think stats are going to tell you everything, but they don't have to do so to be useful. All they have to do is show trends, and give you more information to work with - and if you're open to interpreting, it can be very useful.

"You can’t disagree with the numbers, you can only disagree with the conclusion.” - Stan Bowman
33
#33
7 Frags +

I think that most of the discussion in this thread is shamefully off topic. Instead of trying to derive meaning from OLD stats (kd,dpm), we should be trying to make NEW stats that make sense based off of our collective competitive experience.

Such stats should (1) define a specific situation, (2) have a measurable outcome, and (3) allow for good interpretation. Example using one of the stats I sketched out, Average Health at Uber Pop: (1) Uber Fights, (2) health at start of fight, (3) more health is better. Cut and dry. Compare this to DPM: (1) Non specific, (2) general damage, (3) discussion in thread has determined that interpretation is murky at best. I believe that my new stats work towards showing UNDOUBTEDLY GOOD play and paint a CLEAR picture of a competitive match. My stats are not without their flaws, but are a step in the right direction.

I have thought a few more stats to share:

    Even Uber Exchanges (EUE). This is both a stat and a condition for other stats. If ubers are popped within 4-5 seconds of each other, it is an Even Uber Exchange.
    Stat 1 of EUE: Average Time Between Ubers (in an EUE). Also called "milking," this describes how long the other medic hold his uber before popping. Positive and negative values should be kept separate to distinguish between offensive and defensive ubering. In an EUE, it is best for the offensive uber to pop the defensive uber as quickly as possible.
    Stat 2 of EUE: Average Flashes (in an EUE). Every time an uber is switched between targets, the uber gets shorter. In an EUE, the uber length is tantamount to damage done. Even in 1 second, a pocket soldier can put out 200+ damage to an unubered target. Combined with Stat 1 and Average Health at Uber Pop, you can show how often you have "good ubers" or "bad ubers"
    Stat 3 of EUE: Damage Dealt and Taken (in an EUE): All damage during EITHER uber in an EUE is considered (or 13 seconds after first uber). I consider this more of a flank stat then a combo stat actually. A flank that does considerably more damage than the other flank will have an advatage post uber. This also eliminates the need for a stat such as "Health After EUE."

I call on each of you to examine Team Fortress 2. Put a name on important plays in a match. Define it. Flesh it out. Come up with creative solutions to express it. Logs is a bit limiting but you can still work around it.

I think that most of the discussion in this thread is shamefully off topic. Instead of trying to derive meaning from OLD stats (kd,dpm), we should be trying to make NEW stats that make sense based off of our collective competitive experience.

Such stats should (1) define a specific situation, (2) have a measurable outcome, and (3) allow for good interpretation. Example using one of the stats I sketched out, Average Health at Uber Pop: (1) Uber Fights, (2) health at start of fight, (3) more health is better. Cut and dry. Compare this to DPM: (1) Non specific, (2) general damage, (3) discussion in thread has determined that interpretation is murky at best. I believe that my new stats work towards showing UNDOUBTEDLY GOOD play and paint a CLEAR picture of a competitive match. My stats are not without their flaws, but are a step in the right direction.

I have thought a few more stats to share:

[list]Even Uber Exchanges (EUE). This is both a stat and a condition for other stats. If ubers are popped within 4-5 seconds of each other, it is an Even Uber Exchange.
[/list]

[list]Stat 1 of EUE: Average Time Between Ubers (in an EUE). Also called "milking," this describes how long the other medic hold his uber before popping. Positive and negative values should be kept separate to distinguish between offensive and defensive ubering. In an EUE, it is best for the offensive uber to pop the defensive uber as quickly as possible.
[/list]

[list]Stat 2 of EUE: Average Flashes (in an EUE). Every time an uber is switched between targets, the uber gets shorter. In an EUE, the uber length is tantamount to damage done. Even in 1 second, a pocket soldier can put out 200+ damage to an unubered target. Combined with Stat 1 and Average Health at Uber Pop, you can show how often you have "good ubers" or "bad ubers"
[/list]

[list]Stat 3 of EUE: Damage Dealt and Taken (in an EUE): All damage during EITHER uber in an EUE is considered (or 13 seconds after first uber). I consider this more of a flank stat then a combo stat actually. A flank that does considerably more damage than the other flank will have an advatage post uber. This also eliminates the need for a stat such as "Health After EUE."
[/list]

I call on each of you to examine Team Fortress 2. Put a name on important plays in a match. Define it. Flesh it out. Come up with creative solutions to express it. Logs is a bit limiting but you can still work around it.
34
#34
2 Frags +

A neat stat woud be uber forces by a player, but I can't think of how you could measure if a player forced him or not.

A neat stat woud be uber forces by a player, but I can't think of how you could measure if a player forced him or not.
35
#35
1 Frags +
kaceExample using one of the stats I sketched out, Average Health at Uber Pop: (1) Uber Fights, (2) health at start of fight, (3) more health is better. Cut and dry.
I have thought a few more stats to share:
    Even Uber Exchanges (EUE). This is both a stat and a condition for other stats. If ubers are popped within 4-5 seconds of each other, it is an Even Uber Exchange.

1. Team health before ubering
2. Length of uber/Uber milking
3. Team damage during uber

I think this would definitely be useful to see on paper. You'd be able to see clearly if you're not buffing your whole team enough before exchanging or if your flank always gets destroyed during uber fights and why you never win the post ubers.

[quote=kace]Example using one of the stats I sketched out, Average Health at Uber Pop: (1) Uber Fights, (2) health at start of fight, (3) more health is better. Cut and dry.
I have thought a few more stats to share:

[list]Even Uber Exchanges (EUE). This is both a stat and a condition for other stats. If ubers are popped within 4-5 seconds of each other, it is an Even Uber Exchange.
[/list]

[/quote]
1. Team health before ubering
2. Length of uber/Uber milking
3. Team damage during uber

I think this would definitely be useful to see on paper. You'd be able to see clearly if you're not buffing your whole team enough before exchanging or if your flank always gets destroyed during uber fights and why you never win the post ubers.
36
#36
2 Frags +
MiNiA neat stat woud be uber forces by a player, but I can't think of how you could measure if a player forced him or not.

A simplification could be this: if a medic received damage withing the last 1 or 2 seconds before "chargedeployed" was triggered in the log file and his health is below a threshold (say, 50%?), it could be safe to assume that the damage has forced him to pop. If he receives damage from multiple enemies, there are two options here:

  1. assume a force pop from the enemy who caused the most damage (probably the best option)
  2. assume a force pop from the enemy who caused the last damage

Any thoughts?

[quote=MiNi]A neat stat woud be uber forces by a player, but I can't think of how you could measure if a player forced him or not.[/quote]

A simplification could be this: if a medic received damage withing the last 1 or 2 seconds before "chargedeployed" was triggered in the log file and his health is below a threshold (say, 50%?), it could be safe to assume that the damage has forced him to pop. If he receives damage from multiple enemies, there are two options here:
[olist]
[*] assume a force pop from the enemy who caused the most damage (probably the best option)
[*] assume a force pop from the enemy who caused the last damage
[/olist]

Any thoughts?
37
#37
1 Frags +
MiNiA neat stat woud be uber forces by a player, but I can't think of how you could measure if a player forced him or not.

you could record the last player(s) to damage the medic before he pops

although this would skew the results for the pocket, because when he pushes he'll damage the enemy medic to initiate an uber exchange which doesn't really count as a force

[quote=MiNi]A neat stat woud be uber forces by a player, but I can't think of how you could measure if a player forced him or not.[/quote]
you could record the last player(s) to damage the medic before he pops

although this would skew the results for the pocket, because when he pushes he'll damage the enemy medic to initiate an uber exchange which doesn't really count as a force
38
#38
0 Frags +
AloSecMiNiA neat stat woud be uber forces by a player, but I can't think of how you could measure if a player forced him or not.you could record the last player(s) to damage the medic before he pops

although this would skew the results for the pocket, because when he pushes he'll damage the enemy medic to initiate an uber exchange which doesn't really count as a force

I could argue that this would NOT skew the results of the pocket, as it is his job to force the enemy medic during an uber exchange. Anyway, let's try to keep the discussion focused on the stats themselves, since their interpretations are always going to be up for debating. @mansfield7 posted a nice quote that sums this up earlier:

mansfield7"You can’t disagree with the numbers, you can only disagree with the conclusion.” - Stan Bowman
[quote=AloSec][quote=MiNi]A neat stat woud be uber forces by a player, but I can't think of how you could measure if a player forced him or not.[/quote]
you could record the last player(s) to damage the medic before he pops

although this would skew the results for the pocket, because when he pushes he'll damage the enemy medic to initiate an uber exchange which doesn't really count as a force[/quote]

I could argue that this would NOT skew the results of the pocket, as it is his job to force the enemy medic during an uber exchange. Anyway, let's try to keep the discussion focused on the stats themselves, since their interpretations are always going to be up for debating. @mansfield7 posted a nice quote that sums this up earlier:

[quote=mansfield7]"You can’t disagree with the numbers, you can only disagree with the conclusion.” - Stan Bowman[/quote]
39
#39
5 Frags +

damage dealt/heals recieved ratio for demos/pockets

damage dealt/heals recieved ratio for demos/pockets
40
#40
1 Frags +

Damage per shot taken would be cool but it's not possible I don't think

Damage per shot taken would be cool but it's not possible I don't think
41
#41
12 Frags +

cant win if you don't cap 8)

cant win if you don't cap 8)
42
#42
2 Frags +
LexxMiNiA neat stat woud be uber forces by a player, but I can't think of how you could measure if a player forced him or not.
A simplification could be this: if a medic received damage withing the last 1 or 2 seconds before "chargedeployed" was triggered in the log file and his health is below a threshold (say, 50%?), it could be safe to assume that the damage has forced him to pop. If he receives damage from multiple enemies, there are two options here:
  1. assume a force pop from the enemy who caused the most damage (probably the best option)
  2. assume a force pop from the enemy who caused the last damage


Any thoughts?

Uber forces have more to do with intention than actual results. A sniper can force a medic without ever hitting a shot just because he is LOOKING at the medic. In addition, some forces are the result of pressure applied to players other than the medic. This is called "saving." Furthermore, in unequal uber situations, a medic might want to pop through a choke point to capitalize on the other team's horrendous positioning. I don't really call it a force when you kill the entire combo afterwards. All in all, this makes it rather hard to examine from a statistical standpoint.

However, I can posit a different approach to it. Let a medic be forced when he ubers and less than 2 people die on the other team (can be 0) during the uber along with 1 second before (roamer rockets after death/fall damage) and 3 seconds after (follow up). In effect, this really says that you had a bad push but I consider it a good enough proxy. In addition it gives no credit to any individual "forcer" but eh.

[quote=Lexx][quote=MiNi]A neat stat woud be uber forces by a player, but I can't think of how you could measure if a player forced him or not.[/quote]

A simplification could be this: if a medic received damage withing the last 1 or 2 seconds before "chargedeployed" was triggered in the log file and his health is below a threshold (say, 50%?), it could be safe to assume that the damage has forced him to pop. If he receives damage from multiple enemies, there are two options here:
[olist]
[*] assume a force pop from the enemy who caused the most damage (probably the best option)
[*] assume a force pop from the enemy who caused the last damage
[/olist]

Any thoughts?[/quote]

Uber forces have more to do with intention than actual results. A sniper can force a medic without ever hitting a shot just because he is LOOKING at the medic. In addition, some forces are the result of pressure applied to players other than the medic. This is called "saving." Furthermore, in unequal uber situations, a medic might want to pop through a choke point to capitalize on the other team's horrendous positioning. I don't really call it a force when you kill the entire combo afterwards. All in all, this makes it rather hard to examine from a statistical standpoint.

However, I can posit a different approach to it. Let a medic be forced when he ubers and less than 2 people die on the other team (can be 0) during the uber along with 1 second before (roamer rockets after death/fall damage) and 3 seconds after (follow up). In effect, this really says that you had a bad push but I consider it a good enough proxy. In addition it gives no credit to any individual "forcer" but eh.
43
#43
5 Frags +
Ggglygycant win if you don't cap 8)

Actually thinking something very similar.

In baseball, the advanced sabremetrics are evaluating one thing and one thing only: runs scored.

Ultimately, the only thing that truly matters in tf2 are last point caps. Everything else is an indirect measure of a teams effectiveness at capping the last point. It would be interesting to frame these stat analyses by evaluating how good of a predictor of last point caps something is.

[quote=Ggglygy]cant win if you don't cap 8)[/quote]

Actually thinking something very similar.

In baseball, the advanced sabremetrics are evaluating one thing and one thing only: runs scored.

Ultimately, the only thing that truly matters in tf2 are last point caps. Everything else is an indirect measure of a teams effectiveness at capping the last point. It would be interesting to frame these stat analyses by evaluating how good of a predictor of last point caps something is.
44
#44
1 Frags +

http://www.vanillatf2.org/2013/08/i49-by-the-numbers/

Anyone know what the equation is for figuring out the heal adjust damage per minute?

http://www.vanillatf2.org/2013/08/i49-by-the-numbers/

Anyone know what the equation is for figuring out the heal adjust damage per minute?
45
#45
1 Frags +

I really hate how much this game has become stat-based.

Oh man 300 dpm as roamer I must be good.
(jk I just spammed chokes all game and did fuck all else)

I really hate how much this game has become stat-based.

Oh man 300 dpm as roamer I must be good.
(jk I just spammed chokes all game and did fuck all else)
46
#46
1 Frags +
Hellbenthttp://play.esea.net/index.php?s=stats&d=match&id=3648984

I remember watching this at the half. The jews were leading 3-1 while doing 3000 less damage and having 25 less frags.

This usually tells you that the wining team got massively smashed on the rounds they lost barely registering a frag, but the rounds they won were really close and hard fought so they didn't make the statistical gap back up.

Except on Viaduct where suicide waves render kill stats meaningless because of the spawn timers. Then you have to look at ubers to see how the suicides pay off.

[quote=Hellbent]http://play.esea.net/index.php?s=stats&d=match&id=3648984

I remember watching this at the half. The jews were leading 3-1 while doing 3000 less damage and having 25 less frags. [/quote]
This usually tells you that the wining team got massively smashed on the rounds they lost barely registering a frag, but the rounds they won were really close and hard fought so they didn't make the statistical gap back up.

Except on Viaduct where suicide waves render kill stats meaningless because of the spawn timers. Then you have to look at ubers to see how the suicides pay off.
47
#47
1 Frags +
kaidusdamage dealt/heals recieved ratio for demos/pockets

You can't compare unless everyone uses the advanced medic stats plugin, which even now doesn't happen all the time in prem matches.

[quote=kaidus]damage dealt/heals recieved ratio for demos/pockets[/quote]
You can't compare unless everyone uses the advanced medic stats plugin, which even now doesn't happen all the time in prem matches.
48
#48
2 Frags +

The stats systems are great.

What I would like to see is something like what valve did years ago is that had an overhead view of the map and that showed where most of the kills took place. Something like that on a round per round basis would be very useful.

The stats systems are great.

What I would like to see is something like what valve did years ago is that had an overhead view of the map and that showed where most of the kills took place. Something like that on a round per round basis would be very useful.
49
#49
0 Frags +
pine_beetleThe stats systems are great.

What I would like to see is something like what valve did years ago is that had an overhead view of the map and that showed where most of the kills took place. Something like that on a round per round basis would be very useful.

I was just thinking about this it would be awesome to see where all the weak spots/ strong spots of every team are

[quote=pine_beetle]The stats systems are great.

What I would like to see is something like what valve did years ago is that had an overhead view of the map and that showed where most of the kills took place. Something like that on a round per round basis would be very useful.[/quote]
I was just thinking about this it would be awesome to see where all the weak spots/ strong spots of every team are
50
#50
Momentum Mod
6 Frags +
LexxHellbentIt is kind of worrisome that teams look for people with high damage outputs or good k/d's to consider them for teams now. How should you play in your tryout? Try to get the most damage out by baiting/dive bombing at retarded times, or play a smarter game?
If K/D and DPM aren't as accurate, can you think of any other stats that would indicate "a smarter game"?

No.

The right way of determining the skill level of a player is to watch a demo of them where they feel they played well. I'm Sorry, this is a thread about stats so I should probably talk about what stats are good for.

I can't deny that I got a boner when I saw rando doing 382dpm, or clock doing 372, or serv0 doing 410. I would say from a spectator view they matter to the people just for entertainment purposes but it really doesn't mean anything to a player reviewing the match.

[quote=Lexx][quote=Hellbent]It is kind of worrisome that teams look for people with high damage outputs or good k/d's to consider them for teams now. How should you play in your tryout? Try to get the most damage out by baiting/dive bombing at retarded times, or play a smarter game?[/quote]

If K/D and DPM aren't as accurate, can you think of any other stats that would indicate "a smarter game"?[/quote]
No.

The right way of determining the skill level of a player is to watch a demo of them where they feel they played well. I'm Sorry, this is a thread about stats so I should probably talk about what stats are good for.

I can't deny that I got a boner when I saw rando doing 382dpm, or clock doing 372, or serv0 doing 410. I would say from a spectator view they matter to the people just for entertainment purposes but it really doesn't mean anything to a player reviewing the match.
51
#51
-1 Frags +

Hellbent it doesn't take rocket appliances to understand that if a player consistently has a high dpm and k/d that they are a good player. Sure you can backtrack and look at recordings of their play but that not necessarily indicative of how they will do on your team. If they
have high dpm and k/d on their old teams, and continue to have it on your team you can completely skip the demo review crap altogether.

Hellbent it doesn't take rocket appliances to understand that if a player consistently has a high dpm [u]and[/u] k/d that they are a good player. Sure you can backtrack and look at recordings of their play but that not necessarily indicative of how they will do on your team. If they
have high dpm and k/d on their old teams, and continue to have it on your team you can completely skip the demo review crap altogether.
52
#52
0 Frags +
Ggglygycant win if you don't cap 8)

if you get more airshots than the other team, you've won the real game

[quote=Ggglygy]cant win if you don't cap 8)[/quote]
if you get more airshots than the other team, you've won the real game
53
#53
1 Frags +
HellbentThe right way of determining the skill level of a player is to watch a demo of them where they feel they played well. I'm Sorry, this is a thread about stats so I should probably talk about what stats are good for.

I can't deny that I got a boner when I saw rando doing 382dpm, or clock doing 372, or serv0 doing 410. I would say from a spectator view they matter to the people just for entertainment purposes but it really doesn't mean anything to a player reviewing the match.

If you're 100% right, then people doing machine learning and artificial intelligence would be out of a job... I do get where you're coming from, however let me try to change your mind about this a little bit. The general argument that you're making is something like "it takes a human being to do X / to draw a conclusion about Y". Think of the task of diagnosing a skin lesion and identifying whether it is a malignant tumor or not. Up until recently, the first step of this process required a trained professional to actually take a look at the lesion and decide whether it looks suspicious and further investigation is required. Now, we've figured out how to train a computer to analyze images of skin lesions and look for clues (shape, size, color, texture etc.) that would indicate a potential malignant tumor. These systems are not perfect, obviously, however they're becoming more and more helpful and in the end, their utility is what really matters. The general idea for this and for my own TF2 app is this:

  1. understand the type of reasoning that a human expert would use for a particular problem;
  2. break that reasoning down into manageable steps and identify the clues (performance metrics) that are needed;
  3. "teach" a computer to apply the same reasoning for that type of problem.

This thread is all about the second step in that list. I get it that you don't think that DPM or K/D ratios would be that much helpful in determining good play. But this doesn't automatically mean that the only way to do this is to have a dude analyze the demo/stream himself.

[quote=Hellbent]
The right way of determining the skill level of a player is to watch a demo of them where they feel they played well. I'm Sorry, this is a thread about stats so I should probably talk about what stats are good for.

I can't deny that I got a boner when I saw rando doing 382dpm, or clock doing 372, or serv0 doing 410. I would say from a spectator view they matter to the people just for entertainment purposes but it really doesn't mean anything to a player reviewing the match.[/quote]

If you're 100% right, then people doing machine learning and artificial intelligence would be out of a job... I do get where you're coming from, however let me try to change your mind about this a little bit. The general argument that you're making is something like "it takes a human being to do X / to draw a conclusion about Y". Think of the task of diagnosing a skin lesion and identifying whether it is a malignant tumor or not. Up until recently, the first step of this process required a trained professional to actually take a look at the lesion and decide whether it looks suspicious and further investigation is required. Now, we've figured out how to [url=http://en.wikipedia.org/wiki/Computer_vision]train a computer[/url] to analyze images of skin lesions and look for clues (shape, size, color, texture etc.) that would indicate a potential malignant tumor. These systems are not perfect, obviously, however they're becoming more and more helpful and in the end, their utility is what really matters. The general idea for this and for my own TF2 app is this:

[olist]
[*] understand the type of reasoning that a human expert would use for a particular problem;
[*] break that reasoning down into manageable steps and identify the clues (performance metrics) that are needed;
[*] "teach" a computer to apply the same reasoning for that type of problem.
[/olist]

This thread is all about the second step in that list. I get it that you don't think that DPM or K/D ratios would be that much helpful in determining good play. But this doesn't automatically mean that the only way to do this is to have a dude analyze the demo/stream himself.
54
#54
2 Frags +
TaKoCheesehttp://www.vanillatf2.org/2013/08/i49-by-the-numbers/

Anyone know what the equation is for figuring out the heal adjust damage per minute?

I'd like to know this formula as well, I didn't manage to deduce it by myself after reading that link. Hopefully @GentlemanJon is still around.

[quote=TaKoCheese]http://www.vanillatf2.org/2013/08/i49-by-the-numbers/

Anyone know what the equation is for figuring out the heal adjust damage per minute?[/quote]

I'd like to know this formula as well, I didn't manage to deduce it by myself after reading that link. Hopefully @GentlemanJon is still around.
55
#55
0 Frags +
LexxIf you're 100% right, then people doing machine learning and artificial intelligence would be out of a job... I do get where you're coming from, however let me try to change your mind about this a little bit. The general argument that you're making is something like "it takes a human being to do X / to draw a conclusion about Y". Think of the task of diagnosing a skin lesion and identifying whether it is a malignant tumor or not. Up until recently, the first step of this process required a trained professional to actually take a look at the lesion and decide whether it looks suspicious and further investigation is required. Now, we've figured out how to train a computer to analyze images of skin lesions and look for clues (shape, size, color, texture etc.) that would indicate a potential malignant tumor. These systems are not perfect, obviously, however they're becoming more and more helpful and in the end, their utility is what really matters. The general idea for this and for my own TF2 app is this:

  1. understand the type of reasoning that a human expert would use for a particular problem;
  2. break that reasoning down into manageable steps and identify the clues (performance metrics) that are needed;
  3. "teach" a computer to apply the same reasoning for that type of problem.


This thread is all about the second step in that list. I get it that you don't think that DPM or K/D ratios would be that much helpful in determining good play. But this doesn't automatically mean that the only way to do this is to have a dude analyze the demo/stream himself.

You're misreading Hellbent's post; he was only saying that stats alone (as they are right now) is not a very good determination of performance. He didn't imply that it's impossible. Without more comprehensive stats or a way to read demos (and reliable GET demos), any automated analysis done is going to be faulty by nature.

[quote=Lexx]If you're 100% right, then people doing machine learning and artificial intelligence would be out of a job... I do get where you're coming from, however let me try to change your mind about this a little bit. The general argument that you're making is something like "it takes a human being to do X / to draw a conclusion about Y". Think of the task of diagnosing a skin lesion and identifying whether it is a malignant tumor or not. Up until recently, the first step of this process required a trained professional to actually take a look at the lesion and decide whether it looks suspicious and further investigation is required. Now, we've figured out how to [url=http://en.wikipedia.org/wiki/Computer_vision]train a computer[/url] to analyze images of skin lesions and look for clues (shape, size, color, texture etc.) that would indicate a potential malignant tumor. These systems are not perfect, obviously, however they're becoming more and more helpful and in the end, their utility is what really matters. The general idea for this and for my own TF2 app is this:

[olist]
[*] understand the type of reasoning that a human expert would use for a particular problem;
[*] break that reasoning down into manageable steps and identify the clues (performance metrics) that are needed;
[*] "teach" a computer to apply the same reasoning for that type of problem.
[/olist]

This thread is all about the second step in that list. I get it that you don't think that DPM or K/D ratios would be that much helpful in determining good play. But this doesn't automatically mean that the only way to do this is to have a dude analyze the demo/stream himself.[/quote]
You're misreading Hellbent's post; he was only saying that stats alone (as they are right now) is not a very good determination of performance. He didn't imply that it's impossible. Without more comprehensive stats or a way to read demos (and reliable GET demos), any automated analysis done is going to be faulty by nature.
56
#56
1 Frags +
manaYou're misreading Hellbent's post; he was only saying that stats alone (as they are right now) is not a very good determination of performance. He didn't imply that it's impossible. Without more comprehensive stats or a way to read demos (and reliable GET demos), any automated analysis done is going to be faulty by nature.

Well this is why I started this thread in the first place. To discuss the existing stats and figure out new/better ones. So let's first agree that automatically analyzing video stream data is cumbersome and should be the last resort. Extracting data from STV demos is also impractical, since nobody knows the structure of the network packets that are dumped into those files. So that leaves us with whatever we can get out of log files. This is what I'm trying to work with right now. And if this isn't good enough, well then, the next question is what would be needed instead? What other data would be valuable to have inside a log file? And once we identify that, it's only a matter of time before someone knowledgeable about TF2 server plugins can update them to dump this new data into the logs.

[quote=mana]You're misreading Hellbent's post; he was only saying that stats alone (as they are right now) is not a very good determination of performance. He didn't imply that it's impossible. Without more comprehensive stats or a way to read demos (and reliable GET demos), any automated analysis done is going to be faulty by nature.[/quote]

Well this is why I started this thread in the first place. To discuss the existing stats and figure out new/better ones. So let's first agree that automatically analyzing video stream data is cumbersome and should be the last resort. Extracting data from STV demos is also impractical, since nobody knows the structure of the network packets that are dumped into those files. So that leaves us with [url=http://pastebin.com/a0fXRSYD]whatever we can get out of log files[/url]. This is what I'm trying to work with right now. And if this isn't good enough, well then, the next question is what would be needed instead? What other data would be valuable to have inside a log file? And once we identify that, it's only a matter of time before someone knowledgeable about [url=http://logs.tf/about] TF2 server plugins[/url] can update them to dump this new data into the logs.
57
#57
1 Frags +
LexxI'd like to know this formula as well, I didn't manage to deduce it by myself after reading that link. Hopefully @GentlemanJon is still around.

I can't remember exactly, a basic ratio produces weird results (i.e. obviously wrong) so I did something to produce a 'damping factor', but I can't remember what. Something to do with standard deviations maybe?

Anyway, although you'll be far more likely to take Kaidus's word for this over mine, it is largely a useless stat. There are a lot of reasons for this but I'll explain the most obvious ones.

Firstly the relationship between pocket (soldier, demo, whatever) and medic isn't one way, they have a duty to protect their medic as well. The better your medic is at staying alive in isolation the less this matters (in MiG this season in ETF2L Ipz had to play with Thun on medic who was very vulnerable and Ipz's stats were suppressed by having to look after him more), but the fact is that it's not simply a matter of getting heals to do damage. Players also get heals to protect the medic and get uber so the basic premise is excessively simplified.

Secondly, having looked at this in some detail a couple of seasons ago and not including it (I think I published my thoughts at the time but can't remember), when one team is getting badly beaten then it produces totally useless results because they never kill the enemy medic. The particular game with an enormous outlier was Epsilon vs CauseWeCan on Metalworks. Epsilon crushed them, Mike put in one of the most statistically dominant performances on Soldier it's possible to imagine, but because the Epsilon medic almost never died the level of healing he received dwarfed even that. Looking at the relationship between healing and damage/frags in the context of that game actually made Mike look bad despite being supreme.

So to look at this relationship you'd need to come up with something that took into account who was responsible for protecting the medic in a particular team and to what extent, how well the medic survives in isolation, and how well that team exploited their healing in relationship to getting uber and how much of a challenge the opposing team was able to mount and it's relationship to healing. There are probably other considerations. I haven't looked at this in detail because it's very hard, the sample size is tiny, and the chances of anyone else understanding it are zero and I generally investigate stats that I think will make reasonable reading or interest me.

[quote=Lexx]I'd like to know this formula as well, I didn't manage to deduce it by myself after reading that link. Hopefully @GentlemanJon is still around.[/quote]
I can't remember exactly, a basic ratio produces weird results (i.e. obviously wrong) so I did something to produce a 'damping factor', but I can't remember what. Something to do with standard deviations maybe?

Anyway, although you'll be far more likely to take Kaidus's word for this over mine, it is largely a useless stat. There are a lot of reasons for this but I'll explain the most obvious ones.

Firstly the relationship between pocket (soldier, demo, whatever) and medic isn't one way, they have a duty to protect their medic as well. The better your medic is at staying alive in isolation the less this matters (in MiG this season in ETF2L Ipz had to play with Thun on medic who was very vulnerable and Ipz's stats were suppressed by having to look after him more), but the fact is that it's not simply a matter of getting heals to do damage. Players also get heals to protect the medic and get uber so the basic premise is excessively simplified.

Secondly, having looked at this in some detail a couple of seasons ago and not including it (I think I published my thoughts at the time but can't remember), when one team is getting badly beaten then it produces totally useless results because they never kill the enemy medic. The particular game with an enormous outlier was Epsilon vs CauseWeCan on Metalworks. Epsilon crushed them, Mike put in one of the most statistically dominant performances on Soldier it's possible to imagine, but because the Epsilon medic almost never died the level of healing he received dwarfed even that. Looking at the relationship between healing and damage/frags in the context of that game actually made Mike look bad despite being supreme.

So to look at this relationship you'd need to come up with something that took into account who was responsible for protecting the medic in a particular team and to what extent, how well the medic survives in isolation, and how well that team exploited their healing in relationship to getting uber and how much of a challenge the opposing team was able to mount and it's relationship to healing. There are probably other considerations. I haven't looked at this in detail because it's very hard, the sample size is tiny, and the chances of anyone else understanding it are zero and I generally investigate stats that I think will make reasonable reading or interest me.
58
#58
1 Frags +

@GentlemanJon, thanks for the comments. I'm hoping that a good combination of stats which amount for damage/frags, heals, ubers and caps would provide enough variation to cover most of the scenarios described on the thread. Ultimately, I'm only interested in the prediction power of the model, even if the successful combination of the weights/features would be hard to interpret for a human.

@GentlemanJon, thanks for the comments. I'm hoping that a good combination of stats which amount for damage/frags, heals, ubers and caps would provide enough variation to cover most of the scenarios described on the thread. Ultimately, I'm only interested in the prediction power of the model, even if the successful combination of the weights/features would be hard to interpret for a human.
59
#59
1 Frags +
Lexx@GentlemanJon, thanks for the comments. I'm hoping that a good combination of stats which amount for damage/frags, heals, ubers and caps would provide enough variation to cover most of the scenarios described on the thread. Ultimately, I'm only interested in the prediction power of the model, even if the successful combination of the weights/features would be hard to interpret for a human.

Ironically giving you a potentially good quality measurement of skill, but with no idea why.

[quote=Lexx]@GentlemanJon, thanks for the comments. I'm hoping that a good combination of stats which amount for damage/frags, heals, ubers and caps would provide enough variation to cover most of the scenarios described on the thread. Ultimately, I'm only interested in the prediction power of the model, even if the successful combination of the weights/features would be hard to interpret for a human.[/quote]
Ironically giving you a potentially good quality measurement of skill, but with no idea why.
1 2
Please sign in through STEAM to post a comment.