Upvote Upvoted 10 Downvote Downvoted
clone_logs: maintain a local clone of logs.tf data
posted in Projects
1
#1
0 Frags +

Hello,

During the "good guy"/"bad guy" list debacle, I was made aware that some were interested in a cleaned up version of the logs.tf dataset. I wrote a script to port the data to a simpler and more legible schema using sqlite3, then added the ability to update with fresh data from the API. It depends only on Python 3, which should be a common tool for data scientists, there are no external libraries required, for ease of use. The schema can be read at the beginning of the script.

https://github.com/ldesgoui/clone_logs

Clones of the first 2378000 logs processed in 100k chunks, as well as a csv dump only containing chat logs up until April 2019, can be found at: https://mega.nz/#F!l9oGiKCb!lTWT2RSkTYv-TJZb92_ksA (You don't need the script to make use of them, they're just sqlite3 databases)

Bests,

Computer nerd

Hello,

During the "good guy"/"bad guy" list debacle, I was made aware that some were interested in a cleaned up version of the logs.tf dataset. I wrote a script to port the data to a simpler and more legible schema using sqlite3, then added the ability to update with fresh data from the API. It depends only on Python 3, which should be a common tool for data scientists, there are no external libraries required, for ease of use. The schema can be read at the beginning of the script.

https://github.com/ldesgoui/clone_logs

Clones of the first 2378000 logs processed in 100k chunks, as well as a csv dump only containing chat logs up until April 2019, can be found at: https://mega.nz/#F!l9oGiKCb!lTWT2RSkTYv-TJZb92_ksA (You don't need the script to make use of them, they're just sqlite3 databases)

Bests,

Computer nerd
2
#2
3 Frags +

very cool, and epic

very cool, and epic
3
#3
2 Frags +

torrent, rclone, syncthing
if it's just a one-time snapshot i'd go for torrent, if it's gonna be updated occasionally probably syncthing

edit: i just realized it's an archive, how much bigger is it unpacked? if it's too big syncthing/rclone is probably out
eedit: 130mb per, i should really just read posts

torrent, rclone, syncthing
if it's just a one-time snapshot i'd go for torrent, if it's gonna be updated occasionally probably syncthing

edit: i just realized it's an archive, how much bigger is it unpacked? if it's too big syncthing/rclone is probably out
eedit: 130mb per, i should really just read posts
4
#4
0 Frags +
zenedit: i just realized it's an archive, how much bigger is it unpacked? if it's too big syncthing/rclone is probably out
eedit: 130mb per, i should really just read posts

Awkward use of words on my part, by archive I meant the entire history, it's uncompressed, I haven't waited through compression yet because it'd take ages, the few tests I ran gave an average of 4.25 compression ratio, so the grand total should arrive at 7 gigabytes.
I also realized I could use LTE to upload, I'll do that once I'm done catching up with the few months I'm missing (2320000 to 2370000)

EDIT: Uploaded a bunch to MEGA and updated OP

[quote=zen]edit: i just realized it's an archive, how much bigger is it unpacked? if it's too big syncthing/rclone is probably out
eedit: 130mb per, i should really just read posts[/quote]
Awkward use of words on my part, by archive I meant the entire history, it's uncompressed, I haven't waited through compression yet because it'd take ages, the few tests I ran gave an average of 4.25 compression ratio, so the grand total should arrive at 7 gigabytes.
I also realized I could use LTE to upload, I'll do that once I'm done catching up with the few months I'm missing (2320000 to 2370000)

EDIT: Uploaded a bunch to MEGA and updated OP
5
#5
0 Frags +

Is there an updated clone with the latest 600k? Or is the play to run

clone_logs.py --import archive/*.sqlite3
clone_logs.py --range 2_400_001 3_119_515

?
(Assuming I'm understanding the docs for the range command correctly).

sorry for necroposting.

Is there an updated clone with the latest 600k? Or is the play to run
[quote]
clone_logs.py --import archive/*.sqlite3
clone_logs.py --range 2_400_001 3_119_515
[/quote] ?
(Assuming I'm understanding the docs for the range command correctly).

sorry for necroposting.
6
#6
0 Frags +

Not to my knowledge, no

Not to my knowledge, no
Please sign in through STEAM to post a comment.