Upvote Upvoted 24 Downvote Downvoted
tf2statr: Compile and aggregate TF2 stats in R
posted in Projects
1
#1
0 Frags +

Managing logs and figuring out tournament, season or even 'last 3 games' stats is a largely manual process. logs.tf shouldn't include this since its whole purpose is to be lightweight and fast. This utility makes it easy to grab these logs and interrogate them using R. I chose R mostly because its fastest for me to code in but also because it innately has the statistical framework for conversion to other packages like machine learning and forecasting.

You can find the package on github.

In terms of functionality you can do things like "logs <- queryLogstf(tournament = "Insomnia52")" and grab all the logs from the tournament then get a giant table of mean per match stats using something like "tab <- aggregateStats(logs)" A more complete use case can be found as a vignette or in a compiled HTML form.

Note about web scraping worries

I use the logs.tf API to get the raw stats and then parse them instead of going to the page and scraping that. So, yes you can track the "number of donks" per game. I do use comp.tf to get the log IDs for events, however, I cache the results in an archive so it isn't called again. If we share our archives by sending me the file or by submitting a pull requests, the archive could encompass all pro TF2 events.
In a related note, UGC and ESEA leagues are not supported due to logs not being up on comp.tf. Creative queries to the logs.tf search API can be done using the tools provided but top level querying will result in an error. I encourage everyone to help populate these fields and use the website cause its a standardized and easy way to look back at the pro TF2 landscape.

Future

- ss support
- Team queries
- Make a "Media notes" automatic document for upcoming games.

Managing logs and figuring out tournament, season or even 'last 3 games' stats is a largely manual process. logs.tf shouldn't include this since its whole purpose is to be lightweight and fast. This utility makes it easy to grab these logs and interrogate them using R. I chose R mostly because its fastest for me to code in but also because it innately has the statistical framework for conversion to other packages like machine learning and forecasting.

You can find the package on [url=https://github.com/sidjai/tf2statr]github[/url].

In terms of functionality you can do things like "logs <- queryLogstf(tournament = "Insomnia52")" and grab all the logs from the tournament then get a giant table of mean per match stats using something like "tab <- aggregateStats(logs)" A more complete use case can be found as a [url=https://github.com/sidjai/tf2statr/blob/master/vignettes/basicQueries.rmd]vignette[/url] or in a [url=http://decision-cream.com/2015/09/16/tf2statr-v0.1.0-Released.html]compiled HTML[/url] form.

[b]Note about web scraping worries[/b]

I use the logs.tf API to get the raw stats and then parse them instead of going to the page and scraping that. So, yes you can track the "number of donks" per game. I do use comp.tf to get the log IDs for events, however, I cache the results in an archive so it isn't called again. If we share our archives by sending me the file or by submitting a pull requests, the archive could encompass all pro TF2 events.
In a related note, UGC and ESEA leagues are not supported due to logs not being up on comp.tf. Creative queries to the logs.tf search API can be done using the tools provided but top level querying will result in an error. I encourage everyone to help populate these fields and use the website cause its a standardized and easy way to look back at the pro TF2 landscape.

[b]Future[/b]

- ss support
- Team queries
- Make a "Media notes" automatic document for upcoming games.
2
#2
5 Frags +

dude.

Dude.

I've been doing this in python and databases which is kinda awkward. This is perfect.

dude.

Dude.

I've been doing this in python and databases which is kinda awkward. This is perfect.
3
#3
1 Frags +

Nice! Haven't used R in bit, but this'll be a good excuse to play around in it again!

Nice! Haven't used R in bit, but this'll be a good excuse to play around in it again!
4
#4
2 Frags +

I hate R with a burning passion.

Nice work though, I'm glad you can tolerate it.

I hate R with a burning passion.

Nice work though, I'm glad you can tolerate it.
5
#5
0 Frags +

I was like 2 inches away from throwing it all into a database, however I thought people would like a personalized library instead ... especially with the player name archives which still looks kinda awful. The problem with R databases is the problem with R as a whole, there are a million packages to do the same thing and only like 5 are supported.

I may just make a shiny app so you don't have to go into R if there is enough support. If you don't like R there is a lot of python - R bridges like this one.

I was like 2 inches away from throwing it all into a database, however I thought people would like a personalized library instead ... especially with the player name archives which still looks kinda awful. The problem with R databases is the problem with R as a whole, there are a million packages to do the same thing and only like 5 are supported.

I may just make a shiny app so you don't have to go into R if there is enough support. If you don't like R there is a lot of python - R bridges like [url=http://rpy.sourceforge.net/]this one[/url].
6
#6
1 Frags +

I tried to do something similar with my i55 graphs, but this is a much better way of doing this.

Hope too see this being used someday in production.

I tried to do something similar with my i55 graphs, but this is a much better way of doing this.

Hope too see this being used someday in production.
7
#7
2 Frags +

btw you don't have to webscrape logs.tf, if you put json in the path it will give you a json response of the compiled stats.

example: http://logs.tf/json/1027894

Sounds like the way this is doing it is more accurate anyway but just a thought.

btw you don't have to webscrape logs.tf, if you put json in the path it will give you a json response of the compiled stats.

example: http://logs.tf/json/1027894

Sounds like the way this is doing it is more accurate anyway but just a thought.
8
#8
0 Frags +

That's what I meant by "JSON API" so I am looking at the raw output. There is so much stuff in there I had to get rid of some of it (with a option to keep) just to make sense of everything.

I am not entirely sure about the accuracy though. I wonder if some configs or plugins need adjusting after the fact that is done on the backend part of logs.tf. In any case, the best solution for this project would still have been to output the raw stats and maybe add in another function to adjust stuff. If there are some adjustments it would be easy to do for leagues and tournaments since they almost always have the same configs throughout.

There have been some corrupted logs but I haven't checked for accuracy. That is why in the i55 data set Froyo players have 17 or 16 games played even though they all played in the same matches.

That's what I meant by "JSON API" so I am looking at the raw output. There is so much stuff in there I had to get rid of some of it (with a option to keep) just to make sense of everything.

I am not entirely sure about the accuracy though. I wonder if some configs or plugins need adjusting after the fact that is done on the backend part of logs.tf. In any case, the best solution for this project would still have been to output the raw stats and maybe add in another function to adjust stuff. If there are some adjustments it would be easy to do for leagues and tournaments since they almost always have the same configs throughout.

There have been some corrupted logs but I haven't checked for accuracy. That is why in the i55 data set Froyo players have 17 or 16 games played even though they all played in the same matches.
9
#9
SizzlingStats
0 Frags +

If you do decide to do something like this for SS, ask us for a db dump.
https://www.reddit.com/r/truetf2/comments/1ocfnp/data_mining_sizzling_stats_what_kinds_of_stats_do/ccr2r6o

If you do decide to do something like this for SS, ask us for a db dump.
https://www.reddit.com/r/truetf2/comments/1ocfnp/data_mining_sizzling_stats_what_kinds_of_stats_do/ccr2r6o
Please sign in through STEAM to post a comment.