Upvote Upvoted 8 Downvote Downvoted
Data Scientist Requesting Help Building Dashboard
posted in Projects
1
#1
0 Frags +

I’m a data scientist who just built a TF2 Competitive 6s stats dashboard using logs from logs.tf — you can check it out here: Dashboard Link.

The main feature is the Player Impact Metric (PIM), a 0–10 score made with machine learning that estimates how much a player’s performance contributed to winning, similar to advanced NBA stats. I.e. if a scout has a PIM of 6.5 it means:

“Player’s performance ranked in the top 65% of all scouts for match impact.”

I’m looking for feedback from competitive players on what’s useful, what’s missing, and what could be improved.
Questions for the community are here in this google doc: Google Doc

Check it out, offer feedback, and if you can answer any of my questions that would be amazing!

I’m a data scientist who just built a TF2 Competitive 6s stats dashboard using logs from logs.tf — you can check it out here: [url=https://ebtrout.github.io/tf2_data_vis/]Dashboard Link.[/url]

The main feature is the Player Impact Metric (PIM), a 0–10 score made with machine learning that estimates how much a player’s performance contributed to winning, similar to advanced NBA stats. I.e. if a scout has a PIM of 6.5 it means:

[b]“Player’s performance ranked in the top 65% of all scouts for match impact.”
[/b]

I’m looking for feedback from competitive players on what’s useful, what’s missing, and what could be improved.
Questions for the community are here in this google doc: [url=https://docs.google.com/document/d/1IQiDbn5P5QQpIbC1LSz5n7Azj2tnSoRwpmj682pZjFM/edit?usp=sharing]Google Doc[/url]

Check it out, offer feedback, and if you can answer any of my questions that would be amazing!
2
#2
Fireside Casts
0 Frags +

would you say your PIM is attempting to be similar to that of counter strike's HLTV rating or vlr.gg's valorant player rating?

would you say your PIM is attempting to be similar to that of counter strike's HLTV rating or vlr.gg's valorant player rating?
3
#3
1 Frags +

to answer your questions:

1. Would players find use in a dashboard like this to evaluate their games?
i'd say so, depends on how PIM is calculated (idk if i missed where that is explained) or how roles affect this (ie expect a roamer to have less calculated impact as pocket soldier). however, i would prefer to see this integrated as an extension on firefox/google instead

2. What stats are players currently using to understand their games?
cant speak for others but usually i look in this order: dpm, dtm, heal %, sometimes kd, and on certain maps (mainly koth/bagel in particular) hp

3. How big of an impact does map / gamemode (koth vs cp) have in how class performs their role
a very big impact, koth vs cp especially has larger dpm/dtm/kills/deaths differences (koth having much higher generally). maps in particular are favoured towards certain roles, for instance, gullywash is a better soldier map than sunshine, so performance/stats will be different. same can be said about bagel vs product (where bagel is stronger for soldiers)

Followup, should players only be compared to other players that played on the same map as them / same gamemode as them?
i believe so, to extend off that, i think comparing between roles (likely based on heal %) is another way to compare player performance (roamer will generally have worse stats than pocket, flank scout vs pocket scout)

4. Is there more data other than what I am sourcing from logs.tf?
not really, most of what is needed can be found there

5. I am limiting these games to only logs that have ETF2l, RGL, LAN, UGC, or scrim in the title. Are there other leagues / keywords I should consider?
a lot of private servers are used for scrims (without scrim in the title of server of course), so ur current keywords are lacking. also, fireside sometimes hosts stuff so that keyword is a good start. unless u are looking for just match logs, which should be fine seeing as rgl uses official servers

notes:
- i doubt its what ur looking for, but i would do something like performance vs heal %, or something similar (maybe include in ur calculation of PIM, however that value is created)
- ik itd very hard as u would likely have to manually change things every season, but if PIM could be used to compare between people on the same role within the same division, it would be more accurate, ie invite roamer vs invite roamer or main demo vs main demo, as different divisions will muddy results
- check out more.tf if you want ideas they have something similar there

edit:
- i looked over the dashboard some more, i love the cp push stats, lot of really interesting stuff there (last conversion rate and defense rate in particular are so cool) but id also like to see how some stats are collected (maybe a description of sort in the future, like how a roll is calculated), with that said i tried it on how one of my own games (https://logs.tf/3682691) and it seems to have incorrectly placed people on wrong roles (for example im on scout on the program)

to answer your questions:

1. Would players find use in a dashboard like this to evaluate their games?
i'd say so, depends on how PIM is calculated (idk if i missed where that is explained) or how roles affect this (ie expect a roamer to have less calculated impact as pocket soldier). however, i would prefer to see this integrated as an extension on firefox/google instead

2. What stats are players currently using to understand their games?
cant speak for others but usually i look in this order: dpm, dtm, heal %, sometimes kd, and on certain maps (mainly koth/bagel in particular) hp

3. How big of an impact does map / gamemode (koth vs cp) have in how class performs their role
a very big impact, koth vs cp especially has larger dpm/dtm/kills/deaths differences (koth having much higher generally). maps in particular are favoured towards certain roles, for instance, gullywash is a better soldier map than sunshine, so performance/stats will be different. same can be said about bagel vs product (where bagel is stronger for soldiers)

Followup, should players only be compared to other players that played on the same map as them / same gamemode as them?
i believe so, to extend off that, i think comparing between roles (likely based on heal %) is another way to compare player performance (roamer will generally have worse stats than pocket, flank scout vs pocket scout)

4. Is there more data other than what I am sourcing from logs.tf?
not really, most of what is needed can be found there

5. I am limiting these games to only logs that have ETF2l, RGL, LAN, UGC, or scrim in the title. Are there other leagues / keywords I should consider?
a lot of private servers are used for scrims (without scrim in the title of server of course), so ur current keywords are lacking. also, fireside sometimes hosts stuff so that keyword is a good start. unless u are looking for just match logs, which should be fine seeing as rgl uses official servers

notes:
- i doubt its what ur looking for, but i would do something like performance vs heal %, or something similar (maybe include in ur calculation of PIM, however that value is created)
- ik itd very hard as u would likely have to manually change things every season, but if PIM could be used to compare between people on the same role within the same division, it would be more accurate, ie invite roamer vs invite roamer or main demo vs main demo, as different divisions will muddy results
- check out more.tf if you want ideas they have something similar there

edit:
- i looked over the dashboard some more, i love the cp push stats, lot of really interesting stuff there (last conversion rate and defense rate in particular are so cool) but id also like to see how some stats are collected (maybe a description of sort in the future, like how a roll is calculated), with that said i tried it on how one of my own games (https://logs.tf/3682691) and it seems to have incorrectly placed people on wrong roles (for example im on scout on the program)
4
#4
0 Frags +
siyowould you say your PIM is attempting to be similar to that of counter strike's HLTV rating or vlr.gg's valorant player rating?

I would say so. I want it to be a similar 1 number summary. I do not know how their stat's are calculated, but from what I've read, I would say we are trying to accomplish similar things.

Edit: After looking more into the HLTV and vlr.gg player rating, I dont think that PIM and those numbers are the same. The player rating seems to be a global rating, while PIM only calculated impact on a Per Game basis.

Making something like that for tf2 would be really interesting, but I would definitely need to figure out a way to understand what games are Invite vs pug vs all the other leagues are

saxophoneto answer your questions:....
program)

1.
Roles right now are not coded explicitly, but more on that in notes reply. Although Im not sure what you mean with the firefox extension. If you mean it would be easier for the dashboard to be in an extension... Yes it would, but I don't know if i could make that happen :(
2.
I will definitely add healing% to the match overview so its easier to access
3.
Okay yeah. I will definitely be looking into model improvements to incorporate the map into the model. Its not as simple as what I am currently doing, but hearing this makes me question the output heavily.
4.
Okay good. Logs.tf outputs data in a very particular way, and if I had to refactor I think i might cry.
5.
Hm. I am aware that a lot of private servers are used for scrims, but it would be very hard to programatically figure out which of these servers are "Real" as I wanted to only focus on what would be considered "Serious" comp matches and not just random Pugs. Will def look into fireside
Notes:
Creating pocket vs roamer would be a great idea. I am currently just randomly assigning scout_1/2 solider_1/2. I was scared to assume pocket vs roamer with heal pct, as I thought that notion was largely defunct. But at least for scout now post medic heal speed buff, probably not. This would def help with debiasing the model.

I would love to include division. However, logs supplied from logs.tf do not make that data easily accessible. I could try to bind it in from other log providers, but data might get scarce

Edit:
Hm. Thats strange on the scout. I currently calculate the class role by looking at which class you played the most time on. I will definitely look into it, because on logs.tf it doesnt even show that you played scout at all....
Tysm!!! This is all amazing and is a huge help.

[quote=siyo]would you say your PIM is attempting to be similar to that of counter strike's HLTV rating or vlr.gg's valorant player rating?[/quote]
I would say so. I want it to be a similar 1 number summary. I do not know how their stat's are calculated, but from what I've read, I would say we are trying to accomplish similar things.

Edit: After looking more into the HLTV and vlr.gg player rating, I dont think that PIM and those numbers are the same. The player rating seems to be a [b]global[/b] rating, while PIM only calculated impact on a [b]Per Game[/b] basis.

Making something like that for tf2 would be really interesting, but I would definitely need to figure out a way to understand what games are Invite vs pug vs all the other leagues are

[quote=saxophone]to answer your questions:....
program)[/quote]
1.
Roles right now are not coded explicitly, but more on that in notes reply. Although Im not sure what you mean with the firefox extension. If you mean it would be easier for the dashboard to be in an extension... Yes it would, but I don't know if i could make that happen :(
2.
I will definitely add healing% to the match overview so its easier to access
3.
Okay yeah. I will definitely be looking into model improvements to incorporate the map into the model. Its not as simple as what I am currently doing, but hearing this makes me question the output heavily.
4.
Okay good. Logs.tf outputs data in a very particular way, and if I had to refactor I think i might cry.
5.
Hm. I am aware that a lot of private servers are used for scrims, but it would be very hard to programatically figure out which of these servers are "Real" as I wanted to only focus on what would be considered "Serious" comp matches and not just random Pugs. Will def look into fireside
Notes:
Creating pocket vs roamer would be a great idea. I am currently just randomly assigning scout_1/2 solider_1/2. I was scared to assume pocket vs roamer with heal pct, as I thought that notion was largely defunct. But at least for scout now post medic heal speed buff, probably not. This would def help with debiasing the model.

I would love to include division. However, logs supplied from logs.tf do not make that data easily accessible. I could try to bind it in from other log providers, but data might get scarce

Edit:
Hm. Thats strange on the scout. I currently calculate the class role by looking at which class you played the most time on. I will definitely look into it, because on logs.tf it doesnt even show that you played scout at all....
Tysm!!! This is all amazing and is a huge help.
5
#5
0 Frags +

The following is in response to question 2 i guess:

I only just scanned ur doc so sorry if i misunderstood something, but while ratios between k/d or dpm/dtm are definitely useful for evaluating impact, you also need to account for volume. In my experience, the player who goes 30/30 has way more impact (not necessarily good or bad) than the player that goes 10/10. Then you have to factor in that these numbers inflate as the game goes longer, and account for in slow/stalematey games that they won't--so keeping track of max and min values among the players in the match i guess.

Another big hurdle with measuring impact with logs is the timing. Sacking for a med at 80% uber in a stalemate that allows your team to take the point for free and roll ad to the next point vs. sacking for a med after they used and wiped your team--both will read as a 1 for 1 trade on the med via logs, but the former is 1000000 times more impactful. Idk if logs gives enough information to track when kills occur relative to each other, and it would still be p sophisticated to interpret those numbers as events.

gl with the project

The following is in response to question 2 i guess:

I only just scanned ur doc so sorry if i misunderstood something, but while ratios between k/d or dpm/dtm are definitely useful for evaluating impact, you also need to account for volume. In my experience, the player who goes 30/30 has way more impact (not necessarily good or bad) than the player that goes 10/10. Then you have to factor in that these numbers inflate as the game goes longer, and account for in slow/stalematey games that they won't--so keeping track of max and min values among the players in the match i guess.

Another big hurdle with measuring impact with logs is the timing. Sacking for a med at 80% uber in a stalemate that allows your team to take the point for free and roll ad to the next point vs. sacking for a med after they used and wiped your team--both will read as a 1 for 1 trade on the med via logs, but the former is 1000000 times more impactful. Idk if logs gives enough information to track when kills occur relative to each other, and it would still be p sophisticated to interpret those numbers as events.

gl with the project
6
#6
0 Frags +

There are a lot of interrelated variables when assessing performance. For example:

  • Higher DPM is usually better, but the ratio between DPM and DTM also matters. For example, a 300dpm/300dtm damage spread would likely not help your team win as much as a 250dpm/200dtm damage spread.
  • Different roles have different performance expectations. The best roamers in the world still often go damage negative (while winning) because their role requires taking negative damage trades during sacs.
  • The pace of the game affects everyone's performance. Doing 300dpm in a 5-4, fast-paced game is arguably less impressive than doing 300dpm in a game with a lot of stalemates (high average time between e.g. capping 2nd and capping last)
  • The ratio between the "good stats" and heal% also matters. Doing 300dpm while taking 30% heals is less impressive than doing 300dpm while taking 5% heals.
  • There are many more relationships like these examples.

My suggestion, if you're trying to get a single metric to measure player performance, would be to measure performance against a sample of known top-level games. For instance, you could gather a sample of 500 invite scrims and matches and compare player stats (dpm, k/d, etc.) on the winning team and the losing team, measuring the difference between the winners and losers. The key here is not the actual numbers, but the difference between the winners and the losers, e.g., what % more damage did the winners do compared to the losers? What % difference in ubers dropped? and so on for many different stats.

Then you would compare the difference in non-sample games to the difference in the sample. For example, if winning invite Demo players do 10% more damage on average than their demo counterparts (on the losing team), then a 10% damage diff would be the "average" or "difference-adjusted" score for demo players (so, a 5 on a 1-10 scale). Comparing winners and losers in a sample of games would allow you to find which stats directly correlate with winning and losing, even if you don't know why those statistics correlate that way. Doing this for many statistics and then averaging them out (for dpm diff, k/d diff, heal% diff, etc.) would be imperfect but give a pretty good overall measure of performance.

Adjusting for roles would also be necessary since expectations vary widely between them. Comparing med to med or demo to demo would be easy, but you could use heal% to determine which players are on combo/flank scout and pocket/roamer. The scout with the higher heal% will almost always be the pocket scout, and same with the pocket soldier. Using a sample of high-level games to train the model would also allow you to handpick who's playing what role, so there wouldn't be mistakes in the data.

Sorry if this isn't exactly what you asked for. I've thought a lot about how to measure performance in sixes and it's honestly pretty difficult. Keep this thread updated with your progress; I'm very interested!

There are a lot of interrelated variables when assessing performance. For example:
[list]
[*]Higher DPM is usually better, but the ratio between DPM and DTM also matters. For example, a 300dpm/300dtm damage spread would likely not help your team win as much as a 250dpm/200dtm damage spread.
[*]Different roles have different performance expectations. The best roamers in the world still often go damage negative (while winning) because their role requires taking negative damage trades during sacs.
[*]The pace of the game affects everyone's performance. Doing 300dpm in a 5-4, fast-paced game is arguably less impressive than doing 300dpm in a game with a lot of stalemates (high average time between e.g. capping 2nd and capping last)
[*]The ratio between the "good stats" and heal% also matters. Doing 300dpm while taking 30% heals is less impressive than doing 300dpm while taking 5% heals.
[*]There are many more relationships like these examples.
[/list]
My suggestion, if you're trying to get a single metric to measure player performance, would be to measure performance against a sample of known top-level games. For instance, you could gather a sample of 500 invite scrims and matches and compare player stats (dpm, k/d, etc.) on the winning team and the losing team, measuring the difference between the winners and losers. The key here is not the actual numbers, but the difference between the winners and the losers, e.g., what % more damage did the winners do compared to the losers? What % difference in ubers dropped? and so on for many different stats.

Then you would compare the difference in non-sample games to the difference in the sample. For example, if winning invite Demo players do 10% more damage on average than their demo counterparts (on the losing team), then a 10% damage diff would be the "average" or "difference-adjusted" score for demo players (so, a 5 on a 1-10 scale). Comparing winners and losers in a sample of games would allow you to find which stats directly correlate with winning and losing, even if you don't know why those statistics correlate that way. Doing this for many statistics and then averaging them out (for dpm diff, k/d diff, heal% diff, etc.) would be imperfect but give a pretty good overall measure of performance.

Adjusting for roles would also be necessary since expectations vary widely between them. Comparing med to med or demo to demo would be easy, but you could use heal% to determine which players are on combo/flank scout and pocket/roamer. The scout with the higher heal% will almost always be the pocket scout, and same with the pocket soldier. Using a sample of high-level games to train the model would also allow you to handpick who's playing what role, so there wouldn't be mistakes in the data.

Sorry if this isn't exactly what you asked for. I've thought a lot about how to measure performance in sixes and it's honestly pretty difficult. Keep this thread updated with your progress; I'm very interested!
Please sign in through STEAM to post a comment.