Big data mining of TF2 logs

#1

Rogue201

0 Frags – +

I’m working on a master’s in data science, and I have a course project on distributed data mining. We have the full history of logs from logs.tf to use for our analysis. What questions does the TF2 community have that big data mining could solve?

Here are some ideas we have so far:
-Finding cheaters with outlier analysis
-Heatmap of deaths on maps
-Metastatistics (how likely you are to win the round if you win the midfight, how likely you are to hold last with uber disad, etc)
-Tracking player improvement over time
-Future match prediction

These are our quick ideas, but we’re very interested in other ideas. Let us know what else you think of! And of course we wouldn't be doing all of this, we would likely focus on one or two areas. Give us new ideas!

Here’s some more technical stuff about the assignment if you’re interested-
Essentially we’re looking to take big data and gain insights from it using really powerful machines (a TB of RAM, 128 threads, 20 GPUs, etc). We’re not allowed to use neural networks (we’re taking that class at the same time), but all other forms of machine learning and data mining are allowed. If you have any ideas for machine learning or data mining methods that you think would be good for this data, hit us with those too!

And of course, huge shoutout to Zoob from logs.tf and his incredible work on that site

I’m working on a master’s in data science, and I have a course project on distributed data mining. We have the full history of logs from logs.tf to use for our analysis. What questions does the TF2 community have that big data mining could solve?

Here are some ideas we have so far:
-Finding cheaters with outlier analysis
-Heatmap of deaths on maps
-Metastatistics (how likely you are to win the round if you win the midfight, how likely you are to hold last with uber disad, etc)
-Tracking player improvement over time
-Future match prediction

These are our quick ideas, but we’re very interested in other ideas. Let us know what else you think of! And of course we wouldn't be doing all of this, we would likely focus on one or two areas. Give us new ideas!

Here’s some more technical stuff about the assignment if you’re interested-
Essentially we’re looking to take big data and gain insights from it using really powerful machines (a TB of RAM, 128 threads, 20 GPUs, etc). We’re not allowed to use neural networks (we’re taking that class at the same time), but all other forms of machine learning and data mining are allowed. If you have any ideas for machine learning or data mining methods that you think would be good for this data, hit us with those too!

And of course, huge shoutout to Zoob from logs.tf and his incredible work on that site

#2

perfection

-50 Frags – +

no bc then i would get banned

#3

Twiggy

12 Frags – +

Tracking player improvement over time : i suggest you ask directly GentlemanJon who has done prior "research" in this field.

I would be interested in stalemates data : can you recognise from the logs that a stalemate happens, and what are the most common ways it gets unlocked, on what maps in what zones does it happen, etc. Although you'd have to work with very incomplete data because the log gives you player positions on kill events and not on damage.

Tracking player improvement over time : i suggest you ask directly GentlemanJon who has done prior "research" in this field.

I would be interested in stalemates data : can you recognise from the logs that a stalemate happens, and what are the most common ways it gets unlocked, on what maps in what zones does it happen, etc. Although you'd have to work with very incomplete data because the log gives you player positions on kill events and not on damage.

#4

Tob

25 Frags – +

Shift of heal distribution and kill distribution as the pocket scout meta started developing over time.

If you're going to compare %wins if mid fight won, perhaps you could also look into the stats of what happens when ally/enemy meds die/live during the mid fight https://puu.sh/COwuy/d37fc50593.png

Shift of heal distribution and kill distribution as the pocket scout meta started developing over time.

If you're going to compare %wins if mid fight won, perhaps you could also look into the stats of what happens when ally/enemy meds die/live during the mid fight https://puu.sh/COwuy/d37fc50593.png

#5

Opti_

7 Frags – +

That sounds like cool project !
I feel like you would have to do heavy pre-process to filter out relevant logs and keep only serious scrims/officials.

Metastatistics as you put it sounds pretty simple to do and while it could be interesting to know, probably not the best thing you can do out of the resources you have.
Match (and even log) prediction would actually be really cool to have but I don't know if you would have enough prior data on the teams/players to be able to do so accurately enough.

Good luck !

That sounds like cool project !
I feel like you would have to do heavy pre-process to filter out relevant logs and keep only serious scrims/officials.

Metastatistics as you put it sounds pretty simple to do and while it could be interesting to know, probably not the best thing you can do out of the resources you have.
Match (and even log) prediction would actually be really cool to have but I don't know if you would have enough prior data on the teams/players to be able to do so accurately enough.

Good luck !

#6

Pheaa

4 Frags – +

Looking at how hard it is to push out of last points at different stages of advantage would be interesting, it would be fun to have some data on exactly how hard it is to push out of badlands last vs other maps.

#7

Olgha

0 Frags – +

As Opti said to find relevant logs, i'm up to help you, if that's only marking logs which are interesting, or any other boring stuff you have to do to create your project

#8

hayes

9 Frags – +

This sounds really cool. My aspirations are to earn a master's in statistics. I've used R and some of the past i-series-lans to do some basic analysis / correlations, but I would be really interested to see it on a larger scale.

The hardest part imo would be trying to find relevant and reliable logs of data. Because the skills of different teams are so radically different, I think some stats will be inaccurate or skewed. Also, ESEA has some strange and misleading stats, and using scrim data (like from b4nny logs, which has messed up heal numbers) might make it difficult. I could try to find some relevant NA logs though. I think some of the most interesting stats would be;

Biggest predictor of player future success.
Soldier performance on mids and correlation to winning mid fights (I guess I only ask because mids seem somewhat random at times)
Effectiveness of double sacs on last depending on where those players sac from.
Performance of team when other medic dies first (how often does the other teams medic live, how often do they win the midfight, etc.)
Average stats of a players class depending on the map.

There is a lot though, those are just some quick ones I had from the top of my head.

This sounds really cool. My aspirations are to earn a master's in statistics. I've used R and some of the past i-series-lans to do some basic analysis / correlations, but I would be really interested to see it on a larger scale.

The hardest part imo would be trying to find relevant and reliable logs of data. Because the skills of different teams are so radically different, I think some stats will be inaccurate or skewed. Also, ESEA has some strange and misleading stats, and using scrim data (like from b4nny logs, which has messed up heal numbers) might make it difficult. I could try to find some relevant NA logs though. I think some of the most interesting stats would be;

[list]
[*] Biggest predictor of player future success.
[*] Soldier performance on mids and correlation to winning mid fights (I guess I only ask because mids seem somewhat random at times)
[*] Effectiveness of double sacs on last depending on where those players sac from.
[*] Performance of team when other medic dies first (how often does the other teams medic live, how often do they win the midfight, etc.)
[*] Average stats of a players class depending on the map.
[/list]

There is a lot though, those are just some quick ones I had from the top of my head.

#9

Klutz__

-3 Frags – +

What about role indication? As in for 6's, determining who is the pocket/roamer soldier or pocket/flank scout based off of deaths, positioning, and frag types. You could easily apply linear regression to determine something like that.

#10

alfa

-1 Frags – +

show me which maps besides viaduct run sniper the most and highest efficiency rating (based on kdr) on those maps, ty

#11

Twiggy

0 Frags – +

For match prediction, take a look at this :
http://aligulac.com/about/faq/

#12

guac

11 Frags – +

I've always wondered if its possible to train a model that can separate a game into periods of stalemates and fights. If one could just break the timeline of the game down into these two classifications you could see some much more interesting stats about yourself. How much do you feed during stalemates? How effective are you during fights? How often do you get picks during stalemates? Does your spam output during stalemates correlate with getting an opening pick? I don't know that much about ML myself so I'm not sure if this is a practical question given your constraints, but I would find it very cool if it could be done.

#13

dbk

2 Frags – +

Correlation of heals to damage/frags depending on role.

#14

warriordragon12

2 Frags – +

how quickly it takes players to return to their standards after taking extended breaks

#15

hayes

0 Frags – +

While I think there are a lot of good ideas being thrown around, how you statistically define those parameters based off the information that logs.tf provides is the main problem. I was thinking specifically about stalemates, and I am not sure what would be the absolute best way to define it (and I'm not sure what logs.tf is capable of). For example, how about something like net kills during a certain time interval? We would expect it to be low, but during times that would not be traditionally a stalemate, this could also be true (rolling out to mid, the moments following a complete wipe of the other team, etc.). How about if both medics have uber, is it always considered a stalemate? There are definitely plenty of circumstances where that is not necessarily true. I've no clue but it would be super neat to know.

#16

glass

7 Frags – +

alfashow me which maps besides viaduct run sniper the most and highest efficiency rating (based on kdr) on those maps, ty

do not give him this knowledge

[quote=alfa]show me which maps besides viaduct run sniper the most and highest efficiency rating (based on kdr) on those maps, ty[/quote]
do not give him this knowledge

#17

Reero

1 Frags – +

OlghaAs Opti said to find relevant logs, i'm up to help you, if that's only marking logs which are interesting, or any other boring stuff you have to do to create your project

I feel like a minimum time/map/player count filter is probably a good way to weed out meme logs

[quote=Olgha]As Opti said to find relevant logs, i'm up to help you, if that's only marking logs which are interesting, or any other boring stuff you have to do to create your project[/quote]
I feel like a minimum time/map/player count filter is probably a good way to weed out meme logs

#18

viper

0 Frags – +

someone already mentioned charting the changes in heal distribution in 6v6 due to meta shifts and balance changes and seeing that would be pretty cool

#19

Zesty

3 Frags – +

Would be pretty cool if you could create 2 teams by inputting steam ids and generate an entire predicted log from those ids.

hayesWhile I think there are a lot of good ideas being thrown around, how you statistically define those parameters based off the information that logs.tf provides is the main problem. I was thinking specifically about stalemates, and I am not sure what would be the absolute best way to define it (and I'm not sure what logs.tf is capable of). For example, how about something like net kills during a certain time interval? We would expect it to be low, but during times that would not be traditionally a stalemate, this could also be true (rolling out to mid, the moments following a complete wipe of the other team, etc.). How about if both medics have uber, is it always considered a stalemate? There are definitely plenty of circumstances where that is not necessarily true. I've no clue but it would be super neat to know.

I think there is actually a way of doing this. I don't think it's possible through logs.tf, but there was a site that took data from demos.tf and showed an animation locations of the players in the map. I think a really cool project would be to take these demos (or if you wanted a smaller but perhaps more useful dataset you could use only invite/prem stvs) and produce a heatmap of player location density. (Each tick from each demo would effectively give you 12 datapoints for this, so you get a pretty big dataset). I imagine maps that are famous for stalemating will have concentrated high density regions (e.g. badlands in lobby and at last). I think this would be more beneficial than heatmaps of deaths which is something valve has already done e.g. with dustbowl.

Also finding cheaters with outlier analysis would be good too.

Would be pretty cool if you could create 2 teams by inputting steam ids and generate an entire predicted log from those ids.

[quote=hayes]While I think there are a lot of good ideas being thrown around, how you statistically define those parameters based off the information that logs.tf provides is the main problem. I was thinking specifically about stalemates, and I am not sure what would be the absolute best way to define it (and I'm not sure what logs.tf is capable of). For example, how about something like net kills during a certain time interval? We would expect it to be low, but during times that would not be traditionally a stalemate, this could also be true (rolling out to mid, the moments following a complete wipe of the other team, etc.). How about if both medics have uber, is it always considered a stalemate? There are definitely plenty of circumstances where that is not necessarily true. I've no clue but it would be super neat to know.[/quote]

I think there is actually a way of doing this. I don't think it's possible through logs.tf, but there was a site that took data from demos.tf and showed an animation locations of the players in the map. I think a really cool project would be to take these demos (or if you wanted a smaller but perhaps more useful dataset you could use only invite/prem stvs) and produce a heatmap of player location density. (Each tick from each demo would effectively give you 12 datapoints for this, so you get a pretty big dataset). I imagine maps that are famous for stalemating will have concentrated high density regions (e.g. badlands in lobby and at last). I think this would be more beneficial than heatmaps of deaths which is something valve has already done e.g. with dustbowl.

Also finding cheaters with outlier analysis would be good too.

#20

warriordragon12

0 Frags – +

hayesWhile I think there are a lot of good ideas being thrown around, how you statistically define those parameters based off the information that logs.tf provides is the main problem.

I think a big problem with using TF2 for a project like this is how much of a team game it is and how much variance there can be in the skill of either team in the log. There is just too much that goes into a players stat line and too little info provided through logs. Teams that only scrim against teams that are worse or equal than them are going to have inflated stats compared to teams that only play better opponents

How would you even account for a player messing around in pugs or scrims vs taking it seriously? Using official ESEA matches is too small of a sample

[quote=hayes]While I think there are a lot of good ideas being thrown around, how you statistically define those parameters based off the information that logs.tf provides is the main problem.[/quote]

I think a big problem with using TF2 for a project like this is how much of a team game it is and how much variance there can be in the skill of either team in the log. There is just too much that goes into a players stat line and too little info provided through logs. Teams that only scrim against teams that are worse or equal than them are going to have inflated stats compared to teams that only play better opponents

How would you even account for a player messing around in pugs or scrims vs taking it seriously? Using official ESEA matches is too small of a sample

#21

hobophobiccityplanner

-4 Frags – +

TwiggyTracking player improvement over time : i suggest you ask directly GentlemanJon who has done prior "research" in this field.

I would be interested in stalemates data : can you recognise from the logs that a stalemate happens, and what are the most common ways it gets unlocked, on what maps in what zones does it happen, etc. Although you'd have to work with very incomplete data because the log gives you player positions on kill events and not on damage.

You can tell if scouts have low DPM that a game was slow. I'd imagine that, if you could see the damage outputs within a certain timeframe, you could do this to pinpoint stalemates. You could also look at offclasses.

[quote=Twiggy]Tracking player improvement over time : i suggest you ask directly GentlemanJon who has done prior "research" in this field.

I would be interested in stalemates data : can you recognise from the logs that a stalemate happens, and what are the most common ways it gets unlocked, on what maps in what zones does it happen, etc. Although you'd have to work with very incomplete data because the log gives you player positions on kill events and not on damage.[/quote]

You can tell if scouts have low DPM that a game was slow. I'd imagine that, if you could see the damage outputs within a certain timeframe, you could do this to pinpoint stalemates. You could also look at offclasses.

#22

Maky

-6 Frags – +

It may be impossible, but I would like to see a bombing success rate stat.

For instance:

If floyd bombs thru choke, what made his bomb successful?
Was it his speed? Or perhaps his height? Did he take the least amount of damage he could have from the jumps he did? If he were to do the jump again in the same match how likely would it be for him to get similar results?

There may be too many variables, but I think that would be extremely cool if you could do that

It may be impossible, but I would like to see a bombing success rate stat.

For instance:

If floyd bombs thru choke, what made his bomb successful?
Was it his speed? Or perhaps his height? Did he take the least amount of damage he could have from the jumps he did? If he were to do the jump again in the same match how likely would it be for him to get similar results?

There may be too many variables, but I think that would be extremely cool if you could do that

#23

Console-

RGB LAN

9 Frags – +

Running logs from LAN events against logs from online play to determine differences in performance might be neat

#24

pajaro

0 Frags – +

MakyIt may be impossible, but I would like to see a bombing success rate stat.

For instance:

If floyd bombs thru choke, what made his bomb successful?
Was it his speed? Or perhaps his height? Did he take the least amount of damage he could have from the jumps he did? If he were to do the jump again in the same match how likely would it be for him to get similar results?

There may be too many variables, but I think that would be extremely cool if you could do that

dont think that logs.tf records anything that even indicates when a bomb happens

[quote=Maky]It may be impossible, but I would like to see a bombing success rate stat.

For instance:

If floyd bombs thru choke, what made his bomb successful?
Was it his speed? Or perhaps his height? Did he take the least amount of damage he could have from the jumps he did? If he were to do the jump again in the same match how likely would it be for him to get similar results?

There may be too many variables, but I think that would be extremely cool if you could do that[/quote]
dont think that logs.tf records anything that even indicates when a bomb happens

#25

Bilbert

0 Frags – +

I would like to know stats before and after certain weapon patches and whitelist changes. I can't think of any in particular off the top of my head though.

#26

Walrex

3 Frags – +

ZestyI think a really cool project would be to take these demos (or if you wanted a smaller but perhaps more useful dataset you could use only invite/prem stvs) and produce a heatmap of player location density. (Each tick from each demo would effectively give you 12 datapoints for this, so you get a pretty big dataset). I imagine maps that are famous for stalemating will have concentrated high density regions (e.g. badlands in lobby and at last). I think this would be more beneficial than heatmaps of deaths which is something valve has already done e.g. with dustbowl.

A thousand times this. A heatmap which shows where players hold most in stalemates would be a great way to look at both map balance and skill level (maps which have overcentralizing points on the heatmap/better players holding in different positions more). You would have to control for logs which are a constant roll back and forth--but any match that goes the full 30 minutes is probably stalematey enough to be reliable.

With all the maptesting going on recently, this would be an asset. Being able to physically see where a map overcentralizes its defensible positions would make balancing and producing competitive maps way WAY easier than the "how does this make you feel" playtesting we have at the moment.

[quote=Zesty]I think a really cool project would be to take these demos (or if you wanted a smaller but perhaps more useful dataset you could use only invite/prem stvs) and produce a heatmap of player location density. (Each tick from each demo would effectively give you 12 datapoints for this, so you get a pretty big dataset). I imagine maps that are famous for stalemating will have concentrated high density regions (e.g. badlands in lobby and at last). I think this would be more beneficial than heatmaps of deaths which is something valve has already done e.g. with dustbowl.[/quote]

A thousand times this. A heatmap which shows where players hold most in stalemates would be a great way to look at both map balance and skill level (maps which have overcentralizing points on the heatmap/better players holding in different positions more). You would have to control for logs which are a constant roll back and forth--but any match that goes the full 30 minutes is probably stalematey enough to be reliable.

With all the maptesting going on recently, this would be an asset. Being able to physically see where a map overcentralizes its defensible positions would make balancing and producing competitive maps way WAY easier than the "how does this make you feel" playtesting we have at the moment.

#27

enthrow

6 Frags – +

I really like the idea of a heatmap of death locations on various maps

#28

Golden111

0 Frags – +

EntropyTFI really like the idea of a heatmap of death locations on various maps

This would be epic

[quote=EntropyTF]I really like the idea of a heatmap of death locations on various maps[/quote]
This would be epic

#29

Zesty

3 Frags – +

Ok so I've been thinking about some of the stuff you could do with this over the past couple of days.

I really think it would be cool if you could get information from the demos.tf demo parser which is what I was referring to earlier (https://demos.tf/viewer to see it in action). This provides a wealth more data that logs don't. There are an unbelievable amount of dimensions to the data you could theoretically get at each tick from stvs, it would probably be possible to map things like uber locations, average player health at different parts, sentry spots on last, tracking where individual players stand in their esea matches etc etc.

Your other option is logs which are a lot easier to extract data from but as people have said there's limits to what you can do with them.

I briefly played around with trying to make a heatmap of kill locations (i.e. where the person getting the kill was standing) for individual matches. The hardest part was tying image coordinates from a map overview to ingame coordinates and I could only find overviews for snakewater and badlands where I could do this easily (and I'm not sure that the mapping is totally accurate but it works well enough). I didn't combine multiple logs for this because I didn't have the time but here's a map of Mix^ and iM's kills respectively in (the non golden cap part of) their famous snakewater game:

https://i.imgur.com/ZIgbyjA.jpg

https://i.imgur.com/IcNQJUh.jpg

Ok so I've been thinking about some of the stuff you could do with this over the past couple of days.

I really think it would be cool if you could get information from the [url=https://github.com/demostf/demo.js]demos.tf demo parser[/url] which is what I was referring to earlier (https://demos.tf/viewer to see it in action). This provides a wealth more data that logs don't. There are an unbelievable amount of dimensions to the data you could theoretically get at each tick from stvs, it would probably be possible to map things like uber locations, average player health at different parts, sentry spots on last, tracking where individual players stand in their esea matches etc etc.

Your other option is logs which are a lot easier to extract data from but as people have said there's limits to what you can do with them.

I briefly played around with trying to make a heatmap of kill locations (i.e. where the person getting the kill was standing) for individual matches. The hardest part was tying image coordinates from a map overview to ingame coordinates and I could only find overviews for snakewater and badlands where I could do this easily (and I'm not sure that the mapping is totally accurate but it works well enough). I didn't combine multiple logs for this because I didn't have the time but here's a map of Mix^ and iM's kills respectively in (the non golden cap part of) their famous snakewater game:
[img]https://i.imgur.com/ZIgbyjA.jpg[/img]
[img]https://i.imgur.com/IcNQJUh.jpg[/img]