Randomitsuki 2,819 Posted January 10, 2010 Hi there, stattos! I've spent the last six months in nerdistan. That means, I've wasted several hundreds of hours for some sumo projects of epic proportions. One of them is long-term career predictions, something that was already of particular interest to me in this old thread. My predictions back then were rather sketchy, and I promised to repeat my analyses the hard way. And this I did. What follows now, is a simplified description of my method. Still, it's a little tech-y, so feel free to skip it. Now what have I done? I collected rikishi data. Not a few, but actually about 174.000 data, courtesy of the Sumo Reference. Each data point consists of the following entries: Basho, Rikishi name, age of the rikishi at the basho in question, number of bashos of this particular rikishi at that time, current rank, current strength rating, best rank so far, highest strength rating so far, best career rank, basho number when the career high was reached for the last time, and some more stuff. In order to make ranks comparable across different times, I have converted each rank to a value between 0 (highest rank) and 100 (lowest rank; actually, the details are more complicated, but I'll spare you that). I have collected these data for all rikishi that a) have started since the advent of the six-basho-per year rule (Aki 1957); b) have gone intai meanwhile; and c) have birth dates listed at the Doitsubase. All these data were listed by cohorts. A cohort is a comparable group of rikishi with the same age and the same number of bashos under their belt. For instance, there is a cohort of 15-years olds in their second basho (with more than 2200 entries!). The next step involved a rather complicated analysis of each cohort. The goal of this analysis was to determine whether there are significantly different sub-groups within a cohort. Let me illustrate this by way of an example, the cohort of 14-year old rikishi in their 3rd basho. This cohort consists of 112 different rikishi. Their best career rank (converted) ranged from 0 (later Yokozuna Kitanoumi in Natsu 1967) to 89 (a guy called Mabuchi in Hatsu 1966). The best predictor for this cohort is current rank, and the current ranks (converted) in their third basho ranged from 76 (also for Kitanoumi) to 100 (for several guys). The output that my program gave me for this cohort is as follows: 14 3 0 11 5.5 85.7 14 3 12 40 26 89.1 14 3 41 100 70.5 91.7 In other words, there were three statistically different sub-groups within the cohort. The best sub-group were rikishi whose best final career rank (converted) ranged from 0 to 11 (with a mid-point of 5.5), and whose current rank was 85.7 on average in their third basho. The second sub-group were rikishi with a best rank between 12 and 40 (with a mid-point of 26) - their current rank in the third basho was 89.1 on average. And the third group were the rikishi with a career best rank between 41 and 100 (with a mid-point of 70.5) and a current rank of 91.7. If you compare the three numbers in the last column, you can determine a boundary between first and second group (viz. the mid-point between 85.7 from first group and 89.1 from second group = 87.4). In the same way, there is a boundary between second and third group (at 90.4). And this gives you all the necessary ingredients for a prediction. For instance, if there now were a 14-year old in his third basho whose current rank is between 0 (highest possible rank) and 87.4 (border between first and second group), we could predict that he will be member of the first group. And since this group has an average of 5.5, we would predict a best career rank of 5.5 - converted to our current banzuke this would amount to a Maegashira 15 career-high. Comprende? In one additional step I was determining the average "zenith" of rikishi within each sub-group. For example, rikishi from the first sub-group above peaked in their 62nd basho, rikishi from the second sub-group peaked in their 45th basho, and rikishi from the third sub-group peaked in their 18th basho. The results of all this can be seen below. It lists the rikishi names of the current banzuke, their current rank, their age, their experience (number of bashos), their highest career rank so far, and their highest predicted career rank. If the last column displays "Decline", the rikishi in question is above his zenith, so a further improvement of current best rank is not to be expected. Feel free to find funny faults in these predictions, and take it all with a grain of salt. For instance, there is a pretty high number of lower division rikishi for whom an Ozeki rank is predicted, and there is that elusive "Yokozuna" prediction for Honda. While I fully know that these isolated predictions draw the most attention, let me state that my algorithms don't care about that. In other words, if a guy like Honda would make it to Sekiwake, the prediction would be quite accurate in numerical terms. The accuracy lies in the general trends. Moreover, the predictions vary from basho to basho, as they onlly depend on the relative cohort. E.g., while Goeido is currently listed as being on track for Yokozuna, in Haru basho the prediction will be Ozeki no matter what, and in Natsu Basho it will be Sekiwake no matter what (even if he were on a career-high Sekiwake in both bashos). Some odd predictions have to do with rikishi who are obviously over their prime, but are still not old enough to be recognized by my algorithms as has-beens (e.g. Ryuo, Hochiyama, Chiyohakuho, Masatsukasa, Tamaasuka). Other predictions that look a little strange (for instance, a decline for Baruto already) have to do with small sample sizes. 174.000 data entries sounds like very much, but there are lots of cohorts with the minimum number of only 4 rikishi, and here the predictions become much more sketchy. In those cases where the prediction says "no data", the number of data points is even smaller than four. In any way, I am confident to say that over the whole banzuke you'll have never seen more accurate overall predictions than these. (Sign of approval...) Enjoy. Discuss. Have a laugh. From now on I will post the updated long-term career predictions on shonichi of each basho (I won Share this post Link to post Share on other sites
Fay 1,677 Posted January 10, 2010 Thank you Randomitsuki. This is always interesting and I like it. Of course I like it more when the predictions meet my expectations like Kakuryu, Fujimoto, Chiyoarashi .... I ignore it when the predictions are contrarily to my feelings (Sign of approval...). But I'm a bit concerned about my favourite twins Tamatoryo and Tamaseiryo, one goes up to Juryo, the other one stucks in Jonidan :-( Time to support Seiryo a bit ;-) Anyway thanks for the great work!!! Share this post Link to post Share on other sites
Kintamayama 44,965 Posted January 10, 2010 I appreciate the great effort but as I have stated in the past-skeptic, skeptic, skeptic. Nothing personal, just anti-statistic from the core. I could have saved you a lot of time though- anyone over 30=decline. Took me 3 minutes.. Or no-data (what does that mean for veterans like Miyabiyama/Tamanoshima/Tochinonada?? (forgive me if you explained it, but you allowed us to 'skip it'.. ) Share this post Link to post Share on other sites
Randomitsuki 2,819 Posted January 10, 2010 Or no-data (what does that mean for veterans like Miyabiyama/Tamanoshima/Tochinonada?? (forgive me if you explained it, but you allowed us to 'skip it'.. ) For instance, Miyabiyama is 32 in his 70th basho. There simply haven't been enough Tsukedashi entrants in the past who have been 32 in their 70th basho, so my algorithms do not provide enough numbers. BTW, not everyone over 30 is on a decline (see Chiyotaikai, for whom my algorithm predicted a return to Ozeki - probably the only entity on earth to do so...) Share this post Link to post Share on other sites
Jakusotsu 5,902 Posted January 10, 2010 I haven't delved into all of the gory details, but predicting Asasekiryu for Sekiwake whereas everyone else around him is on a decline seems more than a bit off to me. Share this post Link to post Share on other sites
Kintamayama 44,965 Posted January 10, 2010 BTW, not everyone over 30 is on a decline (see Chiyotaikai, for whom my algorithm predicted a return to Ozeki - probably the only entity on earth to do so...) I wasn't counting retired rikishi.. Share this post Link to post Share on other sites
Randomitsuki 2,819 Posted January 10, 2010 (edited) I haven't delved into all of the gory details, but predicting Asasekiryu for Sekiwake whereas everyone else around him is on a decline seems more than a bit off to me. As I said, there a lots of random fluctuations in the predictions (although they are on a relatively small scale). Asasekiryu is in his 61st basho. The zenith of rikishi in the corresponding sub-group of his cohort is 65.7, so there are four bashos until he'll decline. However, next basho he is in a different sub-group of a different cohort (viz. 28-year old career-high Sekiwake in their 62nd basho). For that particular sub-group the zenith is around the 69th basho. In Natsu Asasekiryu will be in another cohort (that will predict a zenith for the 61st basho, so the prediction will be "Decline"), and in Nagoya he'll be in a sub-group that says his decline is at the 71st basho... All this does not mean that the algorithm is complete bollocks. Perhaps you look at too much detail in a much larger picture. For instance, the main purpose for me was to provide an estimate for guys in the lower divisions. These stats can be rewarding by seeing that a 17-year old who has trouble in Sandanme often has a better career than a guy like Minami (who shot into the upper half of Makushita in few bashos, but his higher age is both the reason for the speedy promotion, and the sudden peak) - of course that is something Asashosakari said again and again, but now you'll see the corresponding numbers. Edited January 10, 2010 by Randomitsuki Share this post Link to post Share on other sites
Randomitsuki 2,819 Posted January 10, 2010 By the by, I did some accuracy testing, nicknamed the "Kintamayama Litmus Test" (referring back to that old thread mentioned above). The test says how good my algorithm was in predicting future Sekitori. And here are the results: A random prediction for Makushita would have a 34% accuracy. My algorithm has a 36% accuracy (in other words, my numbers are really bad when it comes to predicting future sekitorihood for Makushita rikishi). The random prediction for Sandanme would be 8%, while my algorithm has a 46% accuracy. The random prediction for Jonidan rikishi would be 2%, while my algorithm has a 38% accuracy. For Jonokuchi, the numbers are 1% vs. 20%. Bottom line: check Sandanme and below, and the beauty of my work will unfold in front of your very eyes. Or maybe not... Share this post Link to post Share on other sites
Taka 33 Posted January 10, 2010 (edited) Great work Randomitsuki! Surely a "Sumo Prospectus" along the lines of what we see in Baseball with Nate Silver's Pecota model is only a matter of time now! (Sign of approval...) Edited January 10, 2010 by Taka Share this post Link to post Share on other sites
Asashosakari 19,172 Posted January 10, 2010 (edited) BTW, not everyone over 30 is on a decline (see Chiyotaikai, for whom my algorithm predicted a return to Ozeki - probably the only entity on earth to do so...) I wasn't counting retired rikishi.. Are you replacing Uchidate in the YDC? A random prediction for Makushita would have a 34% accuracy. My algorithm has a 36% accuracy (in other words, my numbers are really bad when it comes to predicting future sekitorihood for Makushita rikishi). In fairness, anecdotal evidence (= what I know about my own opinions) points towards "informed opinion" being just as bad. It's a real crapshoot to predict who will get stuck with a high rank in the single-digit makushitas, and who will go higher into sekitoriland. Yet to delve into the whole introductory War and Peace post, but having been privy to the effort from time to time, Randomitsuki already knows I'm in awe of what he's produced here. (Sign of approval...) Edited January 10, 2010 by Asashosakari Share this post Link to post Share on other sites
Kintamayama 44,965 Posted January 10, 2010 BTW, not everyone over 30 is on a decline (see Chiyotaikai, for whom my algorithm predicted a return to Ozeki - probably the only entity on earth to do so...) I wasn't counting retired rikishi.. Are you replacing Uchidate in the YDC? No, but he's gone. Some guy dressed as him showed up today but it may as well have been me up there. Share this post Link to post Share on other sites
Gusoyama 99 Posted January 10, 2010 at first I was a bit surprised by the large number of "Decline" guys in the upper 2 divisions, but I guess that makes sense, because it doesn't mean that they'll keep going down, it just means they won't get up to their highest rank again. This is a good indicator of the anecdotal "Most guys in Makuuchi make it to at least Komusubi once." Might I ask how you mined all that data from the database? Share this post Link to post Share on other sites
Randomitsuki 2,819 Posted January 10, 2010 Might I ask how you mined all that data from the database? With a shovel? I don't know exactly what data mining is, but I have opened every single banzuke file from Sumo Reference, cut and pasted stuff into Excel, and went from there on. Most programs I have written were in Visual Basic, but I've also been using a statistics program called SPSS for some purposes. Share this post Link to post Share on other sites
Fay 1,677 Posted January 10, 2010 Might I ask how you mined all that data from the database? With a shovel? I don't know exactly what data mining is, but I have opened every single banzuke file from Sumo Reference, cut and pasted stuff into Excel, and went from there on. Most programs I have written were in Visual Basic, but I've also been using a statistics program called SPSS for some purposes. I also used SPSS last week but I only found out which customers will decline ... nothing about Tamaseiryo (Sign of approval...) Share this post Link to post Share on other sites
Harry 67 Posted January 10, 2010 Fun stuff! Go Ryuden! I hope the algorithm is correct or even conservative about Tochiyashiki but I have my doubts about my best adoptee to date. Share this post Link to post Share on other sites
Asashosakari 19,172 Posted January 10, 2010 Some odd predictions have to do with rikishi who are obviously over their prime, but are still not old enough to be recognized by my algorithms as has-beens (e.g. Ryuo, Hochiyama, Chiyohakuho, Masatsukasa, Tamaasuka). Who are you to doubt Masatsukasa?! (Shaking head...) E.g., while Goeido is currently listed as being on track for Yokozuna, in Haru basho the prediction will be Ozeki no matter what, and in Natsu Basho it will be Sekiwake no matter what (even if he were on a career-high Sekiwake in both bashos). In the same vein, I'd be curious how volatile the prediction for Tochiozan is. Career high only at komusubi here...well, it confirms my scepticism about him, I guess, but more objectively it's a bit of a surprise. Some fun data browsing: Career-high juryo rikishi predicted to reach makuuchi (sorted by increasing age): Gagamaru - M8, Myogiryu - sekiwake, Dewaotori - sekiwake (!), Daishoumi - M13, Hoshihikari - M3 (I'd like to exchange him for Masatsukasa in the quote above...), Daiyubu - M13, Hoshikaze - M11 Career-high juryo rikishi NOT predicted to reach makuuchi: Hokutokuni, Okinoumi, Tokushinho, Kirinowaka, Sokokurai, Tokusegawa, Sotairyu, Kanbayashi, Kotoyutaka, Sagatsukasa, Shirononami, Dairaido, Yotsuguruma, Toyonokuni, Kotokuni, Kyokunankai, Tochifudo, Wakatenro, Yoshiazuma, Daishoyama, Furuichi, Hokutoiwa, Yanagawa I'm surprised by the scepticism about Okinoumi and Tokusegawa - especially compared to Hoshikaze who's older and deeper into his career - but I guess it's still possible they'll pull a Senshuyama and get stuck in high juryo. Not much to quibble about the rest of this list...maybe there'll be a slight surprise (Kirinowaka?), but overall I'd agree with those assessments (including the mid-juryo high rank for Sokokurai). The first list is quite a mixed bag though. Plenty of hyper-optimism about guys who look like they've already fallen apart - I guess it's always hard to assess just how much a deep fall from the highest-rank-to-date serves to diminish a guy's career prospects. On the other hand, M8 for Gagamaru seems surprisingly low...I have a nagging feeling that Fay (I think it was her) is right and Gagamaru's going to eat himself right out of the sekitori ranks, but I wonder what factors are pulling down the analytical assessment here (just small sample size?). He's still rather young both in age and career length. Myogiryu and Hoshikaze seem about on-target given current knowledge...maybe Myogiryu will be that elusive 20th college sekiwake after all. Share this post Link to post Share on other sites
Asashosakari 19,172 Posted January 10, 2010 (edited) Fun stuff! Go Ryuden! Right on. :-) I'm rather happy that no less than two guys I'm on the bandwagon for (Ryuden and Kawanari) are predicted for an ozeki finish here, now they'll just have to follow up. (Shaking head...) In a related note... Shikona Rank Age Exp Best Rank Prediction ========================================================== Takanoiwa Ms13w 19 7 Makushita 13 Ozeki I'll boldly predict he'll be exhibit #239 for my belief that career-high rank is not quite useful when it deviates significantly from the career-high KK rank. (Kaisei will be exhibit #238.) Edited January 10, 2010 by Asashosakari Share this post Link to post Share on other sites
Asashosakari 19,172 Posted January 10, 2010 By the by, I did some accuracy testing, nicknamed the "Kintamayama Litmus Test" (referring back to that old thread mentioned above). The test says how good my algorithm was in predicting future Sekitori. And here are the results:A random prediction for Makushita would have a 34% accuracy. My algorithm has a 36% accuracy (in other words, my numbers are really bad when it comes to predicting future sekitorihood for Makushita rikishi). The random prediction for Sandanme would be 8%, while my algorithm has a 46% accuracy. The random prediction for Jonidan rikishi would be 2%, while my algorithm has a 38% accuracy. For Jonokuchi, the numbers are 1% vs. 20%. Bottom line: check Sandanme and below, and the beauty of my work will unfold in front of your very eyes. Or maybe not... If you're up to it, I'd love to see a past snapshot (quite a ways back, say, from Hatsu 2000) so we can see what the biggest hits and misses have turned out to be. The algorithm may still struggle with the makushita level, but perhaps that just means there are more factors yet to be identified, and the combined brainpower of the forum might spot a few things. Share this post Link to post Share on other sites
Asashosakari 19,172 Posted January 10, 2010 And the comment onslaught continues... Young makushita rikishi predicted not to reach juryo: the youngest one is Kairyu, 19 years old (for another two months) and freshly at his highest rank of Ms59, predicted for Ms11. That one looks pretty surprising, the next-older few rikishi not so much - mostly 21- and 22-year-olds who also have only cups of coffee in makushita, i.e. a high rank less than Ms45. Biggest outlier: Kagamio, 21 years (until next month), high rank Ms23, prediction Ms4. He's one I'd love to see rolling predictions for...for a while he certainly looked much more promising than he does now. (The fact that he hasn't put on any weight whatsoever since being already in high sandanme 4 years ago probably hasn't helped.) The algorithm rapidly gets more sceptical as soon as guys are 23+ years old...Arawashi (Ms16->Ms9, Kitaharima (Ms18->Ms11), Nionoumi (Ms16->Ms3), Sasaki (Ms18->Ms3), etc. That's more sceptical than I would be, but it's quite possible that my own thinking needs adjusting here. Perhaps interestingly, Minami is also already toast by the predictatron (25 years, Ms5->Ms3). It's not easy being a collegiate rikishi...if you're not a sekitori yet after three years (barring big injuries taking you out of competition altogether), you're unlikely to ever get more than a cup of coffee in juryo. On the other hand, the previously observed hyper-optimism manifests itself here, too, as some rather old rikishi are projected for surprisingly high future ranks: Takateru (26 years, Ms1->J9) may just be the flipside of the Minami projection, but Amuru (26 years, Ms12->M13) is quite an interesting outlier. There's no "foreign rikishi do better" parameter, is there? (Shaking head...) And perhaps as a warning (to myself as well) to not take these snapshots too seriously... Rendaiyama, 22 years, 24 basho, Ms46->Ms17 Tochihiryu, 22 years, 24 basho, Ms43->J3 (Just to make sure - for that career length, the predictor being used is the high rank, not the current rank, right?) Share this post Link to post Share on other sites
Fay 1,677 Posted January 10, 2010 I'm sure Tokusegawa will make it to Makuuchi, no doubt about it for me, I will fight for him :-). Gagamaru and Hoshikaze much too high predictions in my opinion, both are lucky when they stay in Juryo this year. I don't think Tochinoshin will make it to Ozeki but I have higher expectations for Tamawashi as predicted. No chance for Asasekiryu and Homasho for sekiwake in my opinion and Maegashira 3 is much too high for Wakakoyu. But all this not based on any statistic at all but only on watching them from time to time. I'd love to see Masatsukasa as Sekiwake. But I see I have a lot of work to do with my adoptee Hokazan (didn't you predict him as Juryo guy some months ago?) Sandanme? Can't believe this, he should be the next sekitori from Miyagino (Shaking head...) I look forward to your next predictions. Share this post Link to post Share on other sites
Fay 1,677 Posted January 10, 2010 (edited) And as Asashosakari mentioned two other favourites from me ... Kairyu is surely one to watch and I think he will reach at least Juryo. Always watched Kagamio closely and I nearly lost hope for him, but the impression I had in the last bashos made me believe I should wait a bit longer and shouldn't give up so soon (Shaking head...) Edited January 10, 2010 by Fay Share this post Link to post Share on other sites
shimodahito 315 Posted January 10, 2010 very interesting.... i'm even tempted to change some of my stable game rikishi. good work.... and i appreciate you sharing and welcoming critics and comments. one thing is 100%: if i applied for a visa to travel to nerdistan it would certainly be denied! -shimodahito Share this post Link to post Share on other sites
yorikiried by fate 2,016 Posted January 10, 2010 Might I ask how you mined all that data from the database? With a shovel? I don't know exactly what data mining is, but I have opened every single banzuke file from Sumo Reference, cut and pasted stuff into Excel, and went from there on. Most programs I have written were in Visual Basic, but I've also been using a statistics program called SPSS for some purposes. I also used SPSS last week but I only found out which customers will decline ... nothing about Tamaseiryo (Shaking head...) When I was forced to use SPSS back in uni days, the only declining customer was me. Hey Random, glad to see that there are others out there with an unsatiable autistic streak! Keep it coming. Share this post Link to post Share on other sites
Randomitsuki 2,819 Posted January 10, 2010 E.g., while Goeido is currently listed as being on track for Yokozuna, in Haru basho the prediction will be Ozeki no matter what, and in Natsu Basho it will be Sekiwake no matter what (even if he were on a career-high Sekiwake in both bashos). In the same vein, I'd be curious how volatile the prediction for Tochiozan is. Career high only at komusubi here...well, it confirms my scepticism about him, I guess, but more objectively it's a bit of a surprise. Indeed it is. It's a little oddity. I guess it has to do with a relatively unusual starting age, so his cohort isn't exactly huge. And still, it features one rikishi who later became Yokozuna, four later Sekiwake, and three later Komusubi, so in principle, predictions could have been higher for Tochiozan (who is in the best sub-group of his cohort). However, some of the guys in his sub-group were relatively late bloomers (Washuyama, Fukunohana, and particularly Tamanofuji - no wonder, as he spent two and a half years of his career out of Kyokai...). As a consequence, at this career stage those guys who made it big could not be differentiated from many guys who ended up in Juryo. In fact, for this particular cohort there is no statistically detectable difference between guys who finally made it to Yokozuna and guys who ended up at about M12. But this is volatility at its best. In the next few bashos, Tochiozan will be in bigger and more appropriate cohorts, and even if he should fall to Juryo or Makushita, his prediction will be Ozeki. I'm surprised by the scepticism about Okinoumi and Tokusegawa - especially compared to Hoshikaze who's older and deeper into his career - but I guess it's still possible they'll pull a Senshuyama and get stuck in high juryo. Good catch! First of all, there might be a small glitch in my program - if I look up the number for Tokusegawa by myself, he should end up at Maegashira 3. But glitch or not - this is definitely an area for improvement. Right now, I put the rikishi into categories, and assign the mid-point of each category to them. In other words, for Tokusegawa the prediction is either Maegashira 3 (if he is assigned to the first sub-group of this cohort) or Juryo 8 (if he is assigned in the second sub-group). He missed the cut, and so he gets the "Juryo 8" label (which of course is ridiculous, as he already has achived Juryo 1). I should definitely find a way to use a floating prediction rather than put rikishi into fixed categories. Share this post Link to post Share on other sites
Randomitsuki 2,819 Posted January 10, 2010 Shikona Rank Age Exp Best Rank Prediction ========================================================== Takanoiwa Ms13w 19 7 Makushita 13 Ozeki I'll boldly predict he'll be exhibit #239 for my belief that career-high rank is not quite useful when it deviates significantly from the career-high KK rank. (Kaisei will be exhibit #238.) As you know, I even collected data about the highest KK ranks during rikishi careers. But by and large, SPSS has shown me that for rikishi in their third year onward the best predictor for "career highest rank" (to use the terms from the Doitsubase) is "highest rank" and not "highest KK rank". The best predictors for different experience levels are: Strength rating in their second basho. Current rank in their third to twelfth basho. Highest rank after the twelfth basho. Share this post Link to post Share on other sites