There are a lot of interrelated variables when assessing performance. For example:
- Higher DPM is usually better, but the ratio between DPM and DTM also matters. For example, a 300dpm/300dtm damage spread would likely not help your team win as much as a 250dpm/200dtm damage spread.
- Different roles have different performance expectations. The best roamers in the world still often go damage negative (while winning) because their role requires taking negative damage trades during sacs.
- The pace of the game affects everyone's performance. Doing 300dpm in a 5-4, fast-paced game is arguably less impressive than doing 300dpm in a game with a lot of stalemates (high average time between e.g. capping 2nd and capping last)
- The ratio between the "good stats" and heal% also matters. Doing 300dpm while taking 30% heals is less impressive than doing 300dpm while taking 5% heals.
- There are many more relationships like these examples.
My suggestion, if you're trying to get a single metric to measure player performance, would be to measure performance against a sample of known top-level games. For instance, you could gather a sample of 500 invite scrims and matches and compare player stats (dpm, k/d, etc.) on the winning team and the losing team, measuring the difference between the winners and losers. The key here is not the actual numbers, but the difference between the winners and the losers, e.g., what % more damage did the winners do compared to the losers? What % difference in ubers dropped? and so on for many different stats.
Then you would compare the difference in non-sample games to the difference in the sample. For example, if winning invite Demo players do 10% more damage on average than their demo counterparts (on the losing team), then a 10% damage diff would be the "average" or "difference-adjusted" score for demo players (so, a 5 on a 1-10 scale). Comparing winners and losers in a sample of games would allow you to find which stats directly correlate with winning and losing, even if you don't know why those statistics correlate that way. Doing this for many statistics and then averaging them out (for dpm diff, k/d diff, heal% diff, etc.) would be imperfect but give a pretty good overall measure of performance.
Adjusting for roles would also be necessary since expectations vary widely between them. Comparing med to med or demo to demo would be easy, but you could use heal% to determine which players are on combo/flank scout and pocket/roamer. The scout with the higher heal% will almost always be the pocket scout, and same with the pocket soldier. Using a sample of high-level games to train the model would also allow you to handpick who's playing what role, so there wouldn't be mistakes in the data.
Sorry if this isn't exactly what you asked for. I've thought a lot about how to measure performance in sixes and it's honestly pretty difficult. Keep this thread updated with your progress; I'm very interested!