Help with TF2 database design

#1

Lexx_

0 Frags – +

I started working on a database to store all the required information about a TF2 league. The model I will be using is ETF2L season 7 and the data will be extracted from TF2 log files.

I'm currently working on a design of the database itself. Now I'm by no means an expert on relational databases and I imagine that most people on this forum aren't either, however if any of you guys has any experience with this stuff either from school or work, any sort of feedback or comments would be greatly appreciated. Alternatively, is there a better spot I could ask for help with this stuff?

So here's a link to a current entity-relationship diagram (update here) that I drew. The relationships between the entities that I identified are represented using crow's foot notation. The diagram is based on the following description I came up with. Please let me know if there's anything important that I may have overlooked in the description.

A TF2 6vs6 league is composed of a number of annual seasons, each containing a number of skill-based divisions. Each division consists of a series of matches played by teams composed of 6 players. Team line-ups are usually fixed, however substitutes are allowed if the opposing team agrees. Matches are played on different game maps and, depending on their outcome, teams are awarded points which add up to give the rankings within the tournament. Depending on the type of map (KotH or CP), a team needs to win several rounds in a specific time in order to win the match. During a match, players cause different types of events to occur (e.g. a player changed class or a player killed an enemy or the medic healed a team mate etc.). These events happen in succession (in discrete time) and may influence the player who caused the event or other players (either team mates or enemies). The events ultimately dictate the outcome of a match and their net effects are added up into different statistical factors (e.g. kills/deaths ratio) which may indicate the individual skill of a player.

I started working on a database to store all the required information about a TF2 league. The model I will be using is ETF2L season 7 and the data will be extracted from TF2 log files.

I'm currently working on a design of the database itself. Now I'm by no means an expert on relational databases and I imagine that most people on this forum aren't either, however if any of you guys has any experience with this stuff either from school or work, any sort of feedback or comments would be greatly appreciated. Alternatively, is there a better spot I could ask for help with this stuff?

So here's a link to a current [url=http://postimg.org/image/7sxua0bh5/]entity-relationship diagram[/url] (update [url=http://postimg.org/image/h766ci6ft/]here[/url]) that I drew. The relationships between the entities that I identified are represented using [url=http://www.data-e-education.com/ERM/images/ERM_21_E_R_Model_Crows_Foot_Notation_02.png]crow's foot notation[/url]. The diagram is based on the following description I came up with. Please let me know if there's anything important that I may have overlooked in the description.

[quote]A TF2 6vs6 league is composed of a number of annual seasons, each containing a number of skill-based divisions. Each division consists of a series of matches played by teams composed of 6 players. Team line-ups are usually fixed, however substitutes are allowed if the opposing team agrees. Matches are played on different game maps and, depending on their outcome, teams are awarded points which add up to give the rankings within the tournament. Depending on the type of map (KotH or CP), a team needs to win several rounds in a specific time in order to win the match. During a match, players cause different types of events to occur (e.g. a player changed class or a player killed an enemy or the medic healed a team mate etc.). These events happen in succession (in discrete time) and may influence the player who caused the event or other players (either team mates or enemies). The events ultimately dictate the outcome of a match and their net effects are added up into different statistical factors (e.g. kills/deaths ratio) which may indicate the individual skill of a player.[/quote]

#2

mana

-1 Frags – +

I never took any database classes, however, it seems like your diagram is tough to understand because I'm not sure what each term is referring to. Items? Classes? Rankings? Statistics? A lot of this design hinges on how the league itself is designed (for example, you're only supporting seasons, when ETF2L does occasionally do one night cups and other tournaments).

It'd be way easier to design around a league already in mind, rather than the other way around. i.e. the only reason I'd continue on this project is if someone is in need of a database for a league, and they had a reasonable plan, and some kind of accredited background (because who wants to just waste time).

It's also a slippery slope to say that the KA/D ratio can determine the individual skill of a player. It's much better to associate the match with a logs.tf or SizzlingStats log to see the overall picture.

In short, is there any reason you want to build this database? For fun? Because that's a very complicated road to go down if you actually want to accomplish your goal.

I never took any database classes, however, it seems like your diagram is tough to understand because I'm not sure what each term is referring to. Items? Classes? Rankings? Statistics? A lot of this design hinges on how the league itself is designed (for example, you're only supporting seasons, when ETF2L does occasionally do one night cups and other tournaments).

It'd be way easier to design around a league already in mind, rather than the other way around. i.e. the only reason I'd continue on this project is if someone is in need of a database for a league, and they had a reasonable plan, and some kind of accredited background (because who wants to just waste time).

It's also a slippery slope to say that the KA/D ratio can determine the individual skill of a player. It's much better to associate the match with a logs.tf or SizzlingStats log to see the overall picture.

In short, is there any reason you want to build this database? For fun? Because that's a very complicated road to go down if you actually want to accomplish your goal.

#3

mansfield7

0 Frags – +

A division doesn't have a ranking, it has teams. A ranking is displaying teams based on their match results, it's not an entity by itself.

You'd also want an intersection table for player and team, since even if a player is only going to be on one active team at a time you'd want to keep a history, and account for teams going inactive, etc.

I think this is a scenario where you really need fully defined requirements as to what a league is going to be before you start worrying about how to store the data.

A division doesn't have a ranking, it has teams. A ranking is displaying teams based on their match results, it's not an entity by itself.

You'd also want an intersection table for player and team, since even if a player is only going to be on one active team at a time you'd want to keep a history, and account for teams going inactive, etc.

I think this is a scenario where you really need fully defined requirements as to what a league is going to be before you start worrying about how to store the data.

#4

Not_Matlock

-3 Frags – +

I have a little bit of experience with databases. Access and some MySQL. Unfortunately I'm at work atm.

#5

Benroads

0 Frags – +

I'm not expert by any means and there's a very good chance that my professor has been teaching us an outdated as fuck way of doing these diagrams. Anyway I think rounds would need to be weak to matches as they cannot really exist without a match?

#6

mansfield7

1 Frags – +

BenroadsI'm not expert by any means and there's a very good chance that my professor has been teaching us an outdated as fuck way of doing these diagrams. Anyway I think rounds would need to be weak to matches as they cannot really exist without a match?

Diagrams themselves aren't that important as long as you understand the concepts of normalization.

[quote=Benroads]I'm not expert by any means and there's a very good chance that my professor has been teaching us an outdated as fuck way of doing these diagrams. Anyway I think rounds would need to be weak to matches as they cannot really exist without a match?[/quote]
Diagrams themselves aren't that important as long as you understand the concepts of normalization.

#7

Benroads

0 Frags – +

mansfield7BenroadsI'm not expert by any means and there's a very good chance that my professor has been teaching us an outdated as fuck way of doing these diagrams. Anyway I think rounds would need to be weak to matches as they cannot really exist without a match?Diagrams themselves aren't that important as long as you understand the concepts of normalization.

Yeah I dunno I kinda regret taking this teacher as he does not post anything on blackboard and if you miss class and ask for the handouts/whatever happened in class he gets annoyed. Most likely going to have to retake it since his first test was like 4 pages of SQL syntax (written) and I bombed it.

[quote=mansfield7][quote=Benroads]I'm not expert by any means and there's a very good chance that my professor has been teaching us an outdated as fuck way of doing these diagrams. Anyway I think rounds would need to be weak to matches as they cannot really exist without a match?[/quote]
Diagrams themselves aren't that important as long as you understand the concepts of normalization.[/quote]

Yeah I dunno I kinda regret taking this teacher as he does not post anything on blackboard and if you miss class and ask for the handouts/whatever happened in class he gets annoyed. Most likely going to have to retake it since his first test was like 4 pages of SQL syntax (written) and I bombed it.

#8

mansfield7

1 Frags – +

Benroadsmansfield7BenroadsI'm not expert by any means and there's a very good chance that my professor has been teaching us an outdated as fuck way of doing these diagrams. Anyway I think rounds would need to be weak to matches as they cannot really exist without a match?Diagrams themselves aren't that important as long as you understand the concepts of normalization.
Yeah I dunno I kinda regret taking this teacher as he does not post anything on blackboard and if you miss class and ask for the handouts/whatever happened in class he gets annoyed. Most likely going to have to retake it since his first test was like 4 pages of SQL syntax (written) and I bombed it.

feel free to dm me if you need any help or anything

[quote=Benroads][quote=mansfield7][quote=Benroads]I'm not expert by any means and there's a very good chance that my professor has been teaching us an outdated as fuck way of doing these diagrams. Anyway I think rounds would need to be weak to matches as they cannot really exist without a match?[/quote]
Diagrams themselves aren't that important as long as you understand the concepts of normalization.[/quote]

Yeah I dunno I kinda regret taking this teacher as he does not post anything on blackboard and if you miss class and ask for the handouts/whatever happened in class he gets annoyed. Most likely going to have to retake it since his first test was like 4 pages of SQL syntax (written) and I bombed it.[/quote]
feel free to dm me if you need any help or anything

#9

Lexx_

0 Frags – +

manaI never took any database classes, however, it seems like your diagram is tough to understand because I'm not sure what each term is referring to. Items? Classes? Rankings? Statistics? A lot of this design hinges on how the league itself is designed (for example, you're only supporting seasons, when ETF2L does occasionally do one night cups and other tournaments).

It'd be way easier to design around a league already in mind, rather than the other way around. i.e. the only reason I'd continue on this project is if someone is in need of a database for a league, and they had a reasonable plan, and some kind of accredited background (because who wants to just waste time).

It's also a slippery slope to say that the KA/D ratio can determine the individual skill of a player. It's much better to associate the match with a logs.tf or SizzlingStats log to see the overall picture.

In short, is there any reason you want to build this database? For fun? Because that's a very complicated road to go down if you actually want to accomplish your goal.

I'm building the database in order to learn these techniques better and build a "programmer's portfolio" before seeking jobs in the field. It's also my way of contributing back to the community. If anything ever comes out of this, it will be released open-source on GitHub. But first and foremost, it's a personal project meant to help me develop some practical skills.

The database itself will later be used for a machine learning application which will try to model the dynamics of a TF2 competition and try to predict the outcome of a future season. I chose to focus on ETF2L because it seemed easier to find logs for matches in the prem division. Not to say that I'm not interested in ETF2L cups, but I guess these would be modeled in a similar way to an ordinary league. Is there any important difference between

I haven't started designing that bit yet, however I'm going to have to rely on some sort of "performance metrics" such as KA/D for this (just as an example). Think of the players ratings from a Fifa game. Correct me if I'm wrong, but I think you can't tell a program to "look at a log file and see the big picture", that would need artificial intelligence far beyond what I am now able to code. Instead, I think I will have to rely on individual player statistics. Let me know if you have any comments about this. What would be a good metric for the quality of play in TF2?.

mansfield7A division doesn't have a ranking, it has teams. A ranking is displaying teams based on their match results, it's not an entity by itself.

You'd also want an intersection table for player and team, since even if a player is only going to be on one active team at a time you'd want to keep a history, and account for teams going inactive, etc.

I think this is a scenario where you really need fully defined requirements as to what a league is going to be before you start worrying about how to store the data.

Thanks for the comments. You're definitely right about the need to have well defined requirements. In my first post on the thread I added a first version of these, based on my current understanding of how a TF2 competition works. I'm not sure if I got everything right, so any corrections/additions would be very helpful.

Good catch on the intersection table between Players and Teams. Perhaps I need a similar junction table between Teams and Matches as well? And if I add a relationship between Divisions and Teams, then I guess the one between Divisions and Matches becomes redundant. I'm not too good at this whole database normalization thing yet. Here's an updated entity-relationship diagram, I hope I understood your comments well. I'm still not sure what to do about the division table/rankings. Is this an association with derived fields? Or does this only appear in the visual reports generated from the application?

[quote=mana]I never took any database classes, however, it seems like your diagram is tough to understand because I'm not sure what each term is referring to. Items? Classes? Rankings? Statistics? A lot of this design hinges on how the league itself is designed (for example, you're only supporting seasons, when ETF2L does occasionally do one night cups and other tournaments).

It'd be way easier to design around a league already in mind, rather than the other way around. i.e. the only reason I'd continue on this project is if someone is in need of a database for a league, and they had a reasonable plan, and some kind of accredited background (because who wants to just waste time).

It's also a slippery slope to say that the KA/D ratio can determine the individual skill of a player. It's much better to associate the match with a logs.tf or SizzlingStats log to see the overall picture.

In short, is there any reason you want to build this database? For fun? Because that's a very complicated road to go down if you actually want to accomplish your goal.[/quote]

I'm building the database in order to learn these techniques better and build a "programmer's portfolio" before seeking jobs in the field. It's also my way of contributing back to the community. If anything ever comes out of this, it will be released open-source on [url=https://github.com/]GitHub[/url]. But first and foremost, it's a personal project meant to help me develop some practical skills.

The database itself will later be used for a machine learning application which will try to model the dynamics of a TF2 competition and try to predict the outcome of a future season. I chose to focus on ETF2L because it seemed easier to find logs for matches in the prem division. Not to say that I'm not interested in ETF2L cups, but I guess these would be modeled in a similar way to an ordinary league. Is there any important difference between

I haven't started designing that bit yet, however I'm going to have to rely on some sort of "performance metrics" such as KA/D for this (just as an example). Think of the [url=http://www.easports.com/fifa/fut-hub/fut-guide/features/player-ratings-explained]players ratings[/url] from a Fifa game. Correct me if I'm wrong, but I think you can't tell a program to "look at a log file and see the big picture", that would need artificial intelligence far beyond what I am now able to code. Instead, I think I will have to rely on individual player statistics. Let me know if you have any comments about this. What would be a good metric for the quality of play in TF2?.

[quote=mansfield7]A division doesn't have a ranking, it has teams. A ranking is displaying teams based on their match results, it's not an entity by itself.

You'd also want an intersection table for player and team, since even if a player is only going to be on one active team at a time you'd want to keep a history, and account for teams going inactive, etc.

I think this is a scenario where you really need fully defined requirements as to what a league is going to be before you start worrying about how to store the data.[/quote]

Thanks for the comments. You're definitely right about the need to have well defined requirements. In my first post on the thread I added a first version of these, based on my current understanding of how a TF2 competition works. I'm not sure if I got everything right, so any corrections/additions would be very helpful.

Good catch on the intersection table between Players and Teams. Perhaps I need a similar junction table between Teams and Matches as well? And if I add a relationship between Divisions and Teams, then I guess the one between Divisions and Matches becomes redundant. I'm not too good at this whole database normalization thing yet. Here's an [url=http://postimg.org/image/vrkspagxv/]updated entity-relationship diagram[/url], I hope I understood your comments well. I'm still not sure what to do about the division table/rankings. Is this an association with derived fields? Or does this only appear in the visual reports generated from the application?

#10

nTraum

2 Frags – +

If you're interested in seeing how ETF2L's database structure looks like, I'm sure we can send you a schema dump. May makes things a little easier for you. As always, things are a little more complicated than you think (playoffs, tiebrakers to name a few).

I'd suggest to examine the league system and the log of a match separately / independently at the beginning, as both are pretty complex.

If you're interested in seeing how ETF2L's database structure looks like, I'm sure we can send you a schema dump. May makes things a little easier for you. As always, things are a little more complicated than you think (playoffs, tiebrakers to name a few).

I'd suggest to examine the league system and the log of a match separately / independently at the beginning, as both are pretty complex.

#11

Lexx_

0 Frags – +

@nTraum, that would be great, thanks! That would definitely help me with the design, even if I will end up simplifying things for a bit.

As you can imagine, most of my work so far has been guided by the information I got from log files, as these will be primary data source. This explains the "Events" association for example.

@nTraum, that would be great, thanks! That would definitely help me with the design, even if I will end up simplifying things for a bit.

As you can imagine, most of my work so far has been guided by the information I got from log files, as these will be primary data source. This explains the "Events" association for example.

#12

EmilioEstevez

1 Frags – +

You really don't need to write has/is caused by/affects all over the place, the crows feet notation is already more than clear enough. Also what are those diamonds supposed to represent, they seem to be very inconsistently used.

#13

RussianGuyovich

1 Frags – +

mansfield7Diagrams themselves aren't that important as long as you understand the concepts of normalization.

+1

Can't say I've ever created a UML diagram before I started developing which didn't end up binned because of some changes to the design made along the way. Just make sure you don't store duplicate, or even worse, conflicting information where it isn't needed. Consider using identity columns where a primary key isn't easily defined (i.e. unique constraint isn't possible) and use them as foreign keys in your dependant tables.

[quote=mansfield7]
Diagrams themselves aren't that important as long as you understand the concepts of normalization.[/quote]

+1

Can't say I've ever created a UML diagram before I started developing which didn't end up binned because of some changes to the design made along the way. Just make sure you don't store duplicate, or even worse, conflicting information where it isn't needed. Consider using identity columns where a primary key isn't easily defined (i.e. unique constraint isn't possible) and use them as foreign keys in your dependant tables.

#14

EmilioEstevez

1 Frags – +

Also if you're doing this to show to prospective employers, you'll be much better served by showing you have a good grasp of SQL, indexing strategies and understanding query plans, etc.

If you can pass the Oracle SQL fundamentals exam or the SQL Server equivalent that'll go a long way imo (they're not difficult, but they cost a little bit of money to take) and also set you along the path to getting MCP or OCP accreditation.

Also if you're doing this to show to prospective employers, you'll be much better served by showing you have a good grasp of SQL, indexing strategies and understanding query plans, etc.

If you can pass the Oracle SQL fundamentals exam or the SQL Server equivalent that'll go a long way imo (they're not difficult, but they cost a little bit of money to take) and also set you along the path to getting MCP or OCP accreditation.

#15

Lexx_

1 Frags – +

@EmilioEstevez, the diamonds represent associative entities. Regarding SQL, it is not the essential point of my application, which I've described in an earlier post on this thread. The role of the database is simply to store the data required for the learning algorithm that I will implement. The data acquisition and manipulation will be done in R, using an SQL-enabling library called sqldf. So the main functionality of this application lies within the fields of machine learning and data mining, and this is the area where I will be seeking employment.

@RussianGuyovich, thanks for the advice. During my undergrad years we've even been taught to add numerical primary keys to tables that would contain an otherwise suitable primary key (such as the map_name in Maps). It's been some time since then, so now I don't even remember the logic behind that choice (if there ever was one).

@EmilioEstevez, the diamonds represent [url=http://en.wikipedia.org/wiki/Associative_Entities]associative entities[/url]. Regarding SQL, it is not the essential point of my application, which I've described in an earlier post on this thread. The role of the database is simply to store the data required for the learning algorithm that I will implement. The data acquisition and manipulation will be done in R, using an SQL-enabling library called [url=http://code.google.com/p/sqldf/]sqldf[/url]. So the main functionality of this application lies within the fields of machine learning and data mining, and this is the area where I will be seeking employment.

@RussianGuyovich, thanks for the advice. During my undergrad years we've even been taught to add numerical primary keys to tables that would contain an otherwise suitable primary key (such as the map_name in Maps). It's been some time since then, so now I don't even remember the logic behind that choice (if there ever was one).

#16

RussianGuyovich

2 Frags – +

The idea behind identity columns is the value is auto generated by the database engine as opposed to a naturally-defined key on one or multiple columns. While you usually don't specify the value of this identity column during an insert statement, there are ways to perform a manual insert assuming it's a unique value (treated as a constraint).

One big advantage with using identities for a surrogate key compared to natural keys (such as map name) is that your referential integrity will be intact compared to a scenario where the information used for your natural keys might have to be updated.

Even if you don't intend on using an identity column for referential integrity, they are good for large tables which would benefit from a clustered index being present to optimise query plans that would otherwise take too large a storage hit if creating a nonclustered index on one or several columns on that table.

Just to mention, I come from primarily a SQL Server background. The Oracle equivalent to identity columns are sequences.

The idea behind identity columns is the value is auto generated by the database engine as opposed to a naturally-defined key on one or multiple columns. While you usually don't specify the value of this identity column during an insert statement, there are ways to perform a manual insert assuming it's a unique value (treated as a constraint).

One big advantage with using identities for a surrogate key compared to natural keys (such as map name) is that your referential integrity will be intact compared to a scenario where the information used for your natural keys might have to be updated.

Even if you don't intend on using an identity column for referential integrity, they are good for large tables which would benefit from a clustered index being present to optimise query plans that would otherwise take too large a storage hit if creating a nonclustered index on one or several columns on that table.

Just to mention, I come from primarily a SQL Server background. The Oracle equivalent to identity columns are sequences.

#17

mansfield7

1 Frags – +

LexxThanks for the comments. You're definitely right about the need to have well defined requirements. In my first post on the thread I added a first version of these, based on my current understanding of how a TF2 competition works. I'm not sure if I got everything right, so any corrections/additions would be very helpful.

Good catch on the intersection table between Players and Teams. Perhaps I need a similar junction table between Teams and Matches as well? And if I add a relationship between Divisions and Teams, then I guess the one between Divisions and Matches becomes redundant. I'm not too good at this whole database normalization thing yet. Here's an updated entity-relationship diagram, I hope I understood your comments well. I'm still not sure what to do about the division table/rankings. Is this an association with derived fields? Or does this only appear in the visual reports generated from the application?

I don't think you need an intersection table between teams and matches. A match is limited to two teams, so you can just have both team ids in your match table.

As for the ranking, it only appears in visual reports. You already have all the data, you'd just say get the teams in this division ordered by number of wins. Now, the application might choose to cache that data in a table, but that's not part of your main data storage concern.

[quote=Lexx]
Thanks for the comments. You're definitely right about the need to have well defined requirements. In my first post on the thread I added a first version of these, based on my current understanding of how a TF2 competition works. I'm not sure if I got everything right, so any corrections/additions would be very helpful.

Good catch on the intersection table between Players and Teams. Perhaps I need a similar junction table between Teams and Matches as well? And if I add a relationship between Divisions and Teams, then I guess the one between Divisions and Matches becomes redundant. I'm not too good at this whole database normalization thing yet. Here's an [url=http://postimg.org/image/vrkspagxv/]updated entity-relationship diagram[/url], I hope I understood your comments well. I'm still not sure what to do about the division table/rankings. Is this an association with derived fields? Or does this only appear in the visual reports generated from the application?[/quote]

I don't think you need an intersection table between teams and matches. A match is limited to two teams, so you can just have both team ids in your match table.

As for the ranking, it only appears in visual reports. You already have all the data, you'd just say get the teams in this division ordered by number of wins. Now, the application might choose to cache that data in a table, but that's not part of your main data storage concern.

#18

Lexx_

1 Frags – +

Cool, thanks again @mansfield7 for your comments. And @RussianGuyovich, thanks for the refresher on the logic behind the types of keys. Your post brought on a bunch of flashbacks from my undergrad courses. It's a shame I haven't been in touch with this stuff in a bunch of years now.

#19

Rigel

2 Frags – +

If you'd like to see how I did a similar project, here's my code/readme:

https://drive.google.com/folderview?id=0B4KF681WVs2zU2tlV3k0TjVMX1E&usp=sharing

Take whatever you want. It worked out pretty well, but if I wanted an entirely secure and idiot proof database, it needs a lot more functions for checking and such. It's possible for a match to have more than two teams playing in it, stuff like that. It was my first implementation of a database so there are obviously flaws, but maybe it could give you some ideas.

Check FinalReport for an explanation of what my team did and why we did it. I implemented it all so I have a good understanding of what it can and can't do. If you have any questions or want any suggestions, you can PM me. I'm relatively comfortable with databases and am currently on co-op as a DBA assistant, so it's quite relevant to me.

EDIT: Will repost link momentarily with changes

If you'd like to see how I did a similar project, here's my code/readme:

https://drive.google.com/folderview?id=0B4KF681WVs2zU2tlV3k0TjVMX1E&usp=sharing

Take whatever you want. It worked out pretty well, but if I wanted an entirely secure and idiot proof database, it needs a lot more functions for checking and such. It's possible for a match to have more than two teams playing in it, stuff like that. It was my first implementation of a database so there are obviously flaws, but maybe it could give you some ideas.

Check FinalReport for an explanation of what my team did and why we did it. I implemented it all so I have a good understanding of what it can and can't do. If you have any questions or want any suggestions, you can PM me. I'm relatively comfortable with databases and am currently on co-op as a DBA assistant, so it's quite relevant to me.

EDIT: Will repost link momentarily with changes

#20

ykpegedese

1 Frags – +

The nitty gritty of normalization can get very hairy. Long story short, do it on data that shouldn't be normalized and you will get stuck with all the negatives with very little benefit.

What is the real purpose of this data? A lot of read/write? Analysis? How much data will there be? In the world of DW/BI, you rarely see normalization for a reason(long story short). Normalization trades performance for efficient use of space, read/write operations, etc. If analytics is your primary need, you shouldn't need to carry out normalization so heavily. If you're limited by hardware, you may need to consider going one way or the other even if the data you have and the purpose you need may require otherwise. If your data isn't going to be too big, denormalization isn't going to be too necessary either.

It seems that you will be doing analytics similarly to what BI people do. You should consider making your database similar to that of what you see in a DW and mix in normalization when necessary instead of taking normalization to the extreme. When I worked as a data warehouse consultant, getting this done correctly was one of the key concepts to master. Since you don't have a DW size of Loblaws but also will likely have decent amount of data needed for analytics, you shouldn't need to go with either extremes of normalization/denormalization. Keep in mind though, even databases that should carry out normalization often eventually need to be denormalized for performance reasons. I am now a SDE with specialty in ETL, and RDBMS, and denormalization is one of the things I have to do now because our database is getting too large.

The nitty gritty of normalization can get very hairy. Long story short, do it on data that shouldn't be normalized and you will get stuck with all the negatives with very little benefit.

What is the real purpose of this data? A lot of read/write? Analysis? How much data will there be? In the world of DW/BI, you rarely see normalization for a reason(long story short). Normalization trades performance for efficient use of space, read/write operations, etc. If analytics is your primary need, you shouldn't need to carry out normalization so heavily. If you're limited by hardware, you may need to consider going one way or the other even if the data you have and the purpose you need may require otherwise. If your data isn't going to be too big, denormalization isn't going to be too necessary either.

It seems that you will be doing analytics similarly to what BI people do. You should consider making your database similar to that of what you see in a DW and mix in normalization when necessary instead of taking normalization to the extreme. When I worked as a data warehouse consultant, getting this done correctly was one of the key concepts to master. Since you don't have a DW size of Loblaws but also will likely have decent amount of data needed for analytics, you shouldn't need to go with either extremes of normalization/denormalization. Keep in mind though, even databases that should carry out normalization often eventually need to be denormalized for performance reasons. I am now a SDE with specialty in ETL, and RDBMS, and denormalization is one of the things I have to do now because our database is getting too large.

#21

mansfield7

0 Frags – +

ykpegedeseThe nitty gritty of normalization can get very hairy. Long story short, do it on data that shouldn't be normalized and you will get stuck with all the negatives with very little benefit.

What is the real purpose of this data? A lot of read/write? Analysis? How much data will there be? In the world of DW/BI, you rarely see normalization for a reason(long story short). Normalization trades performance for efficient use of space, read/write operations, etc. If analytics is your primary need, you shouldn't need to carry out normalization so heavily. If you're limited by hardware, you may need to consider going one way or the other even if the data you have and the purpose you need may require otherwise. If your data isn't going to be too big, denormalization isn't going to be too necessary either.

It seems that you will be doing analytics similarly to what BI people do. You should consider making your database similar to that of what you see in a DW and mix in normalization when necessary instead of taking normalization to the extreme. When I worked as a data warehouse consultant, getting this done correctly was one of the key concepts to master. Since you don't have a DW size of Loblaws but also will likely have decent amount of data needed for analytics, you shouldn't need to go with either extremes of normalization/denormalization. Keep in mind though, even databases that should carry out normalization often eventually need to be denormalized for performance reasons. I am now a SDE with specialty in ETL, and RDBMS, and denormalization is one of the things I have to do now because our database is getting too large.

You make good points, but I can't see a TF2 league having such a large amount of data that he'd have to worry about denormalizing for performance reasons..I'd err on the side of storing properly normalized data to begin with and only denormalize if performance problems appear.

[quote=ykpegedese]The nitty gritty of normalization can get very hairy. Long story short, do it on data that shouldn't be normalized and you will get stuck with all the negatives with very little benefit.

What is the real purpose of this data? A lot of read/write? Analysis? How much data will there be? In the world of DW/BI, you rarely see normalization for a reason(long story short). Normalization trades performance for efficient use of space, read/write operations, etc. If analytics is your primary need, you shouldn't need to carry out normalization so heavily. If you're limited by hardware, you may need to consider going one way or the other even if the data you have and the purpose you need may require otherwise. If your data isn't going to be too big, denormalization isn't going to be too necessary either.

It seems that you will be doing analytics similarly to what BI people do. You should consider making your database similar to that of what you see in a DW and mix in normalization when necessary instead of taking normalization to the extreme. When I worked as a data warehouse consultant, getting this done correctly was one of the key concepts to master. Since you don't have a DW size of Loblaws but also will likely have decent amount of data needed for analytics, you shouldn't need to go with either extremes of normalization/denormalization. Keep in mind though, even databases that should carry out normalization often eventually need to be denormalized for performance reasons. I am now a SDE with specialty in ETL, and RDBMS, and denormalization is one of the things I have to do now because our database is getting too large.[/quote]

You make good points, but I can't see a TF2 league having such a large amount of data that he'd have to worry about denormalizing for performance reasons..I'd err on the side of storing properly normalized data to begin with and only denormalize if performance problems appear.

#22

shadowmatter

1 Frags – +

It's been awhile since I've actually read an ERD, but I've designed databases at multiple startups. Looking at it...

Things that look right:
- a league has multiple seasons
- a season has multiple divisions
- a division has a series of matches
- each match has two teams
- each team has multiple players
- each match has a series of rounds
- each round has a series of events

Things that look a little off:
- Rankings is not really an entity. I'd expect the Teams table to have a NumWins column and a NumLosses column. Whenever a new match (or row in the Matches table) is added, you could increment the appropriate column of each team (or row in the Teams table) in a transaction. The Teams table could even have in a column named WinPercentage that's computed as NumWins/(NumWins+NumLosses). Then you can create an index that sorts the rows in Teams by NumWins or WinPercentage in descending order. This yields the rankings.
- I'd expect each match to have one map, not each round. Assuming that a match is played in its entirety on 1 map.
- Statistics _probably_ isn't really an entity either. When you want to generate the statistics for a match or a round, you'd probably select the appropriate events (or rows from the Events table) in a query, and then apply aggregate functions like SUM or MAX or AVG to get the data that you want.

Things you won't have to worry about depending on where you stop designing:
- It's probably kinda hard to specify the columns of the Events table. Some events have players but no items (e.g. a player changing teams) and maybe even some events have items but no players (e.g. a health pack respawns). The more challenging aspect will be that some events have a lot of associated data. For example, if player1 kills player2, you could store player1 class, player2 class, player1 location, player2 location, player1 weapon, player2 damage. That's six data items associated with simply "player1 kills player2." Do you create a column for each one? Do you do this for each new event you add to the system? This is where we typically give up and have a "payload" column, that is a string, and stores JSON containing any arbitrary data.

On the topic of normalization, this is good. I went through my NoSQL phase and thought you had to denormalize everything. That's premature optimization, and probably isn't the point of this assignment.

It's been awhile since I've actually read an ERD, but I've designed databases at multiple startups. Looking at it...

Things that look right:
- a league has multiple seasons
- a season has multiple divisions
- a division has a series of matches
- each match has two teams
- each team has multiple players
- each match has a series of rounds
- each round has a series of events

Things that look a little off:
- Rankings is not really an entity. I'd expect the Teams table to have a NumWins column and a NumLosses column. Whenever a new match (or row in the Matches table) is added, you could increment the appropriate column of each team (or row in the Teams table) in a transaction. The Teams table could even have in a column named WinPercentage that's computed as NumWins/(NumWins+NumLosses). Then you can create an index that sorts the rows in Teams by NumWins or WinPercentage in descending order. This yields the rankings.
- I'd expect each match to have one map, not each round. Assuming that a match is played in its entirety on 1 map.
- Statistics _probably_ isn't really an entity either. When you want to generate the statistics for a match or a round, you'd probably select the appropriate events (or rows from the Events table) in a query, and then apply aggregate functions like SUM or MAX or AVG to get the data that you want.

Things you won't have to worry about depending on where you stop designing:
- It's probably kinda hard to specify the columns of the Events table. Some events have players but no items (e.g. a player changing teams) and maybe even some events have items but no players (e.g. a health pack respawns). The more challenging aspect will be that some events have a lot of associated data. For example, if player1 kills player2, you could store player1 class, player2 class, player1 location, player2 location, player1 weapon, player2 damage. That's six data items associated with simply "player1 kills player2." Do you create a column for each one? Do you do this for each new event you add to the system? This is where we typically give up and have a "payload" column, that is a string, and stores JSON containing any arbitrary data.

On the topic of normalization, this is good. I went through my NoSQL phase and thought you had to denormalize everything. That's premature optimization, and probably isn't the point of this assignment.

#23

sandwiches_time

0 Frags – +

mansfield7BenroadsI'm not expert by any means and there's a very good chance that my professor has been teaching us an outdated as fuck way of doing these diagrams. Anyway I think rounds would need to be weak to matches as they cannot really exist without a match?Diagrams themselves aren't that important as long as you understand the concepts of normalization.

Pretty much what he said, I haven't touched a database diagram since leaving school. Not sure what software you're using for it but we mostly used visio and it was pretty decent. Do you get to choose what software you use?

[quote=mansfield7][quote=Benroads]I'm not expert by any means and there's a very good chance that my professor has been teaching us an outdated as fuck way of doing these diagrams. Anyway I think rounds would need to be weak to matches as they cannot really exist without a match?[/quote]
Diagrams themselves aren't that important as long as you understand the concepts of normalization.[/quote]

Pretty much what he said, I haven't touched a database diagram since leaving school. Not sure what software you're using for it but we mostly used visio and it was pretty decent. Do you get to choose what software you use?

#24

mansfield7

0 Frags – +

A great tool for doing database diagrams is Visual Paradigm. It has a free community edition...it's by far the best design/diagram tool I've used. I actually end up doing a lot of ERD's at my work, but it's just my preference to have it diagramed out before I start programming. Also VP let's you generate database scripts from the diagrams. Plus management loves visuals :)

A great tool for doing database diagrams is [url=http://www.visual-paradigm.com/]Visual Paradigm[/url]. It has a free community edition...it's by far the best design/diagram tool I've used. I actually end up doing a lot of ERD's at my work, but it's just my preference to have it diagramed out before I start programming. Also VP let's you generate database scripts from the diagrams. Plus management loves visuals :)

#25

Lexx_

0 Frags – +

@Rigel, thanks for the link! I'll definitely take a look and use your project as a source of inspiration. I'll PM you if I ever get stuck with anything DB-related.

@ykpegedese, you used a bunch of abbreviations that I have no clue about. I assume DW is data warhousing? I know RDBMS, but I don't know what BI, SDE and ETL are.

The real purpose of that data is to store the inputs of a machine learning/data mining application that I will write. The role of that will be to create regression models that predict the outcomes of a TF2 competition based on the individual and team-based skill of players in previous competitions.

@shadowmatter, thanks for the comments. You're probably right, Statistics shouldn't really be an entity as all of its attributes can be computed when needed. As for the Events table, I'm gonna have to think so more about its structure, however your idea of "payload" columns which store JSON strings containing any arbitrary data is quite interesting.

Here's a second update to the ERD. I'll update the first post as well for any newcomers to the thread.

@Rigel, thanks for the link! I'll definitely take a look and use your project as a source of inspiration. I'll PM you if I ever get stuck with anything DB-related.

@ykpegedese, you used a bunch of abbreviations that I have no clue about. I assume DW is data warhousing? I know RDBMS, but I don't know what BI, SDE and ETL are.

The real purpose of that data is to store the inputs of a machine learning/data mining application that I will write. The role of that will be to create regression models that predict the outcomes of a TF2 competition based on the individual and team-based skill of players in previous competitions.

@shadowmatter, thanks for the comments. You're probably right, Statistics shouldn't really be an entity as all of its attributes can be computed when needed. As for the Events table, I'm gonna have to think so more about its structure, however your idea of "payload" columns which store JSON strings containing any arbitrary data is quite interesting.

Here's a [url=http://postimg.org/image/h766ci6ft/]second update[/url] to the ERD. I'll update the first post as well for any newcomers to the thread.