Official Smogon University Usage Statistics Discussion Thread, mk.2

Status
Not open for further replies.
Oh interesting, so by that logic then, we can't really pull a list of "Pokemon A's top teammates" only those mons who it has the most influence over?

Sorry I'm unsure why, but making sense out of this one part of the data set is causing me a lot of trouble. No matter how many different calculations I try, normalizing against various factors, I can't seem to get a list of "Teammates" that comes out how I would expect one to.

That said, if I'm searching for the wrong result, that would explain my frustration thus far.

For example (referring to the 2016-06 VGC2016 data set):

Code:
Groudon-Primal 'usage': 0.655
Groudon-Primal Sum-of-Abilities: 4297.835

Xerneas 'usage': 0.534
Xerneas Sum-of-Abilities: 3507.517

Xerneas' entry in Groudon-Primal 'Teammates': 23.385

Kangaskhan-Mega 'usage': 0.603
Kangaskhan-Mega Sum-of-Abilities: 3962.796

Kangaskhan-Mega' entry in Groudon-Primal 'Teammates': 419.126
Is there a way that I am able to take this data, and generate some sort of normalized value in which I can rank Xerneas versus Kangaskhan-Mega for which is a "better" Groudon-Primal teammate?
 
Last edited:
Grizzle, by "top teammates" you mean you want to know which Pokemon appear most with that mon, regardless of prior distribution (that is, you don't care about the Pokemon's usage by itself)?

My code isn't very well documented, I'll admit, but here's what's going on and how you can undo it:

  1. I have a "teammateMatrix" in a binary file that you don't have access to. This file stores the raw counts for "Pokemon X appeared with Pokemon Y." This would be the number you want.
  2. On line 147, I transform that value into one that only accounts for the boost / drop of Pokemon Y's appearance on a team based on the fact that Pokemon X appeared on that team by subtracting the expected count, if teams were generated randomly based on just the usage numbers. So if Pokemon X's count is "count" and Pokemon Y's usage percentage is "usage[y]" then you'd expect the count for "Pokemon X appeared with Pokemon Y" to be "count * usage[y]".
  3. Ergo: If what you want is to discount the prior distribution and just look at, "what are the most common Pokemon to appear on a team with Pokemon X"? Then you just need to add the prior back in.
The question is how you get "usage[y]", because we're talking percentages, not counts, and there's no "total count" in that file (wtf, Antar?*). You could always parse the usage stats table...

Edit: and you can't just sum abilities throughout, because I only do the top 200 mons.

Edit 2: *"tf" is that this report was intended just to be a more detailed version of the moveset stats, that is, each Pokemon was supposed to be considered independently. This was pretty shortsighted of me, and it'll be rectified as I rewrite this package.
 
Last edited:
I'm working on a similar problem as Grizzle, and I've been going through the json file. In your code, why is count, the sum of the instances of a pokemon's abilities, not the same as the Raw count? Every pokemon has exactly one ability, they should sum to the same number. Are the ability numbers generated from the same samples as usage?
 
Last edited:
pokeprogrammer, this is covered by the third item in the FAQ:

3. What's this business with "Raw" and "Real?"
Jimera0, yeah--I should really include just a bit of text at the top of each Standard Stats post...
  • Usage % : Weighted
  • Raw: Unweighted
  • "Real": Only counts the Pokemon which actually appear in battle (Doubles not supported)
The reason for the name "real" is historic--back when I first took over the stats and then the running of PO, only the Pokemon that appeared in battle were recorded in the logs, so there was no way to actually *get* the full team stats. When I modified PO to generate logs with full team info in them, we were left with a decision regarding which stats to use, and the argument was that counting only Pokemon appearing in battle was somewhat more legit, because that corresponded to actual, or "real" usage (that argument lost out in the end).
 
Antar If we're going to count the amount of teams that use obscure playstyles such as Magic Room and Gravity, I think that Terrains should be added to the list as well.

Electric Terrain team: At least one Pokemon has the move Electric Terrain without the move Nature Power, two Pokemon have the move Electric Terrain, or one Pokemon has whatever Tapu Koko's ability is.
Misty Terrain team: At least one Pokemon has the move Misty Terrain without the move Nature Power, or two Pokemon have the move Misty Terrain.
Grassy Terrain team: At least one Pokemon has the move Grassy Terrain without the move Nature Power or the ability Grass Pelt, or two Pokemon have the move Grassy Terrain.

Requiring Pokemon to not have certain moves helps differentiate stand-alone Terrain sweepers from Terrain-using Support Pokemon. Stand-alone Terrain sweepers are a thing in Other Metagames, where in Balanced Hackmons a Mega Ampharos can run Prankster Electric Terrain and Nature Power to get priority Thunderbolts. Misty Terrain can also be used in VGC Doubles or Battle Spot Triples to block Dark Void (although it is generally outclassed by safeguard). If you see, say, a Xerneas using Misty Terrain in VGC, it will likely be to support the team by blocking Status and not to power up itself.

tl;dr Terrains are niche, but deserve to be recorded under team type.
 
Hey guys, my apologies if this doesn't belong here, but as someone who really enjoys statistics as a whole, could someone explain this to me?

AndrewB73
Joined: Jun 27, 2016

Ratings
Official ladderEloGXEGlicko-1WL
nu1103(more games needed)31Reset
ou128664.7%1617 ± 77177Reset
pu113256.3%1549 ± 79118Reset
ru108954.5%1535 ± 76118Reset
ubers147273.9%1697 ± 416444Reset
uu120664.5%1616 ± 87114Reset
Unofficial ladderEloGXEGlicko-1WL
nususpecttest1154(more games needed)50
pususpecttest1000(more games needed)01
rususpecttest1022(more games needed)12
uususpecttest107442.3%1440 ± 8547

These are my stats on Showdown's ladder. What do things like Elo, GXE, and Glicko-1 mean?
 

Bad Ass

Custom Title
is a Tiering Contributor Alumnusis the 2nd Grand Slam Winneris a Past SPL Championis a Three-Time Past WCoP Champion
Howdy. I'm trying to build something for RBY that takes in the first n revealed pokemon and spits out the best guess of what the last 6-n pokemon are. I'm pretty confused about how exactly the teammate stats are calculated. I tried doing a correlation between points (from here http://www.smogon.com/stats/2016-08/chaos/gen1ou-1760.json) and % increase (from here http://www.smogon.com/stats/2016-08/moveset/gen1ou-1760.txt). Interestingly I got the same number for individual pokemon, but each pokemon had a different number (e.g. for Golem it's about .16% change per point in the json file, but for Alakazam it's about .08%).

How can i use these data to make a meaningful metric like "47% of teams that have a lapras also have an alakazam"? Or alternatively, "a team that has lapras is 4.5% more likely to have an alakazam than the average team"?

Antar

thanks in advance for any time taken to help me out
 

Bad Ass

Custom Title
is a Tiering Contributor Alumnusis the 2nd Grand Slam Winneris a Past SPL Championis a Three-Time Past WCoP Champion
so in the json file in my last post, Starmie's teammate number for Chansey is -182.30732025389. clearly this is some kind of weight/modifier, not a raw "one percentage minus another", and so a larger negative number corresponds to "teams with starmie are moderately less likely [probably around 8%] to also have a chansey". What I meant to ask was: what's the schema for changing the -182 -> "moderately less likely" to -182 -> "x.x% less likely".
 
Bad Ass, IIRC, divide by count (sum of abilities).

Check your numbers against what's in the reports.

I'm sorry this is so obscure and poorly documented. Things will be a lot better post-rewrite.
 
Stats for the month will be up in the next few days. I need to add the new CAP data and rerun that tier. May do a partial upload before that.
 
Hello, I have some questions

1. What's the difference between "Raw" in a usage file (ex. 1070733 for Landorus-Therian in http://www.smogon.com/stats/2016-09/ou-1695.txt) and "Raw count" in a moveset file (ex.
1178902 for Landorus-Therian in the same month)? My best guess is that the raw count in the moveset file is including too-short battles, and the raw count in the usage file is excluding them?

2. In the moveset file, in the Checks and Counters section, what do the three numbers on a Pokemon's first line represent? I mean the 79.978, 83.14, and 0.79 in a line like "Manaphy 79.978 (83.14±0.79)". I tried figuring it out from the code, but failed pretty miserably.

Thanks for any answers you can give, and thanks for maintaining this great resource for so long!
 
Stats for the month (sans CAP) are now up. Working on a CAP rerun now. Drk Pwnr, please read the FAQ in the second post for the answer to your first question. I'll give you an answer to #2 in a bit. Feel free to ping me again if you don't get a response in a day or two.
 
Status
Not open for further replies.

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top