Randomitsuki 2,826 Posted November 27, 2019 (edited) Ever since I discovered sumo for myself, I was fascinated by kimarite - the rarer, the better, and we all know that there is a bewildering variety of strange kimarite. At the same time, I must admit that I am completely befuddled by exactly that bewildering variety. Not only that, I am also quite clueless as to the average kimarite on an average day (I tested this the other day by watching one of Moti's videos and tried to predict the kimarite as I saw the bouts - I got less than 20% correct, and we are talking here about mostly yorikiri, oshidashi etc.). Therefore I embarked on a project which had the goal for me to learn more about the relations between different kimarite. I found this great NHK source which shows videos of almost all kimarite (with basho bouts and staged training bouts including an English explanation).I learnt quite a bit from this source. Yet at the same time, I realized that my visual skills and my anatomical knowledge are simply too limited - I am terribly bad in actually seeing what's going on on the dohyo in breakneck speed. So I decided to take an approach that is much more to my abilities. My idea was to learn about the relations between kimarite by collecting kimarite stats about rikishi. Ideally, I would have taken all bouts with kimarite information of all rikishi from SumoDB, but this would taken years to collect when I was only willing to spend days on it. So, rather than processing all kimarite of all rikishi, I only looked up the kimarite profiles of the 250 rikishi with most recorded wins (taken from this list). Moreover, I compiled a list of 29 additional rikishi who were having a high number of rare kimarite. While 279 rikishi (out of a total of 7258 rikishi with kimarite information) does not sound like good coverage, these 279 rikishi were involved in a total of 139197 bout with valid kimarite, thus representing about a quarter of all kimarite-associated bouts in the database. Fairly representative, I'd hope. My Excel sheets had the names of the 279 rikishi in my list as rows, and the 79 different kimarite (excluding non-kimarite like fumidashi, koshikudake, tsukite etc.) as columns. Each cell then shows how often a given rikishi had a win with a given kimarite. Then I computed pairwise correlations among all combinations of kimarite (columns) which resulted in a 79 x 79 matrix. I then converted the correlation coefficients into distance scores. From this point onwards, my truly statistics-versed colleague Lisa took over (huge thanks to her!!), and at my request she performed a statistical technique called multidimensional scaling which tries to convert a complex, 79 x 79 matrix as best as possible into a two-dimensional representation, in other words: into some kind of map. You can see the outcome of this map below. Here is how the map can be read: If two kimarite on the map are in close proximity to each other, they tend to be highly positively correlated. This means that when rikishi often (or rarely) used the one kimarite, they also often (or rarely) used the other kimarite. Conversely, a large distance on the map generally means that high counts for one kimarite are typically associated with low counts for the other kimarite (in other words, these kimarite are negatively correlated and seem to be quite dissimilar). And the map looks like this (you might get an enlarged version by clicking on it): Based on another statistical technique which I will not further bore you with, the data seem to indicate that the kimarite can be divided into five different clusters. So I (or rather, Lisa) requested the algorithms to identify five areas in this map that "somehow" belong together. The different colors in which kimarite names appear on this map represents this attempt to cluster the techniques. Just to remind you, this approach is 100% data-driven. The computer is obviously totally oblivious about anything to do with kimarite, it also cannot and does not know that "yorikiri" and "yoritaoshi" start with the same letters, and therefore should be better placed close to each other; the algorithm just looks for patterns. Nonetheless, many of these knowledge-free, data-driven categorizations and localizations on the map make sense to me. First, the algorithm discovered a cluster (in yellow) that has all the well-known oshi and tsuki techniques in relatively close vicinity, but also threw hatakikomi and tsukiotoshi into the mix. Sounds reasonable to me. I was particularly pleased with the second cluster (color-coded in black) which represents the most common techniques starting with "okuri-", not too distant from the oshi/tsuki cluster. Third, very distant from all the oshi stuff there is the large green cluster in the upper left corner, and it contains the common yotsu techniques (yorikiri, yoritaoshi, but as I expected, not too far from abisetaoshi and tsuridashi). I was also quite happy that kotenage, kimedashi, and kimetaoshi were grouped into the green cluster. One thing that surprised me (but still is in the data) is that uwatenage, uwatedashinage, and uwatehineri were farther apart than I thought (uwatedashinage was even sorted into a different cluster). Fourth, there is the blue cluster in the lower left. Apart from the quite spaced triple shitatenage/shitatedashinage/shitatehineri it contains many leg trips (among the common leg trip, only sotogake was sorted into the green cluster). I don’t know whether there is some physical reality into the assumption (suggested by the data) that uwate- techniques come closer to yotsu-zumo than shitate- techniques, and apparently that sotogake is closer to classical yotsu-zumo than uchigake. Maybe those with "real" sumo knowledge can chime in. Finally, there is the big red cluster in the middle of the map. While being closer to yotsu-zumo than oshi-zumo, the central position indicates that the red techniques are related (or rather: unrelated) to all other clusters to a similar extent. I was a bit disappointed that so many ultra-rare, exotic - and potentially different? - kimarite are lumped together here, but again, maybe those with better knowledge may still assess whether relative places within the red cluster make sense (e.g., is it justified to have tsukaminage much closer to standard yotsu than, say, ipponzeoi?). But now I have blabbered enough. What do you think? Edited November 27, 2019 by Randomitsuki 14 7 Share this post Link to post Share on other sites
Yamanashi 3,840 Posted November 27, 2019 Wow, it's too much for me to process yet. One question, though: what are the x- and y- variables? Share this post Link to post Share on other sites
Randomitsuki 2,826 Posted November 27, 2019 Just now, Yamanashi said: Wow, it's too much for me to process yet. One question, though: what are the x- and y- variables? That's a good question for which I only have a bad answer: the x and y are meaningless by themselves - multidimensional scaling just tries to optimize complex relations among variables into a map, and it definitely does not use any pre-coded categories. For instance, I did not tell the algorithm to put yotsu-zumo in the upper left corner, and in fact, it is completely arbitrary that yotsu was put in the upper left rather than upper right corner. On this map, it's only the relative distances between the dots that matter. 1 Share this post Link to post Share on other sites
Tsuchinoninjin 1,269 Posted November 27, 2019 For the red cluster, there probably just isn't enough data to sort those kimarite into other data, so the just all end up in the middle as a 'default'. I'm not completely sure, but I would bet those in the center of the cluster have a few less data points than those nearer to the outside of the red cluster. When you chose the rare kimarite rikishi, did you limit it to sekitori or anyone is ok? Weird stuff goes down in the lower divisions. I think it can be argued that a lot of the rare kimarite are only possible when the loser does some really weird sumo or some other improper motion. In this case I think its better if they aren't associated with 'good sumo' techniques to be honest. Share this post Link to post Share on other sites
Randomitsuki 2,826 Posted November 27, 2019 1 minute ago, Tsuchinoninjin said: When you chose the rare kimarite rikishi, did you limit it to sekitori or anyone is ok? I started by looking up the rare kimarite themselves, e.g. this query for mitokorozeme. Names that cropped up more often on these lists were included in the total dataset, no matter which division they were in. 1 Share this post Link to post Share on other sites
Asashosakari 19,331 Posted November 27, 2019 (edited) 2 hours ago, Tsuchinoninjin said: When you chose the rare kimarite rikishi, did you limit it to sekitori or anyone is ok? Weird stuff goes down in the lower divisions. I think it can be argued that a lot of the rare kimarite are only possible when the loser does some really weird sumo or some other improper motion. In this case I think its better if they aren't associated with 'good sumo' techniques to be honest. Either that, or possibly also because rikishi who are high in unusual techniques often have extremely unorthodox styles / style mixes that just defy classification. Potentially tied to your suggestion, too, in that one of their unorthodox abilities may be to take advantage of opponent mistakes in ways not otherwise seen. Anyway, my first thought also was that the rare stuff may just be too under-represented in the sample for the algorithm to establish strong associations to anything else. On another note, I was kind of befuddled that the algorithm has sorted hikiotoshi into the black cluster rather than the yellow one, but I guess it makes sense in some way, as hikiotoshi is often somewhat of an "opponent mistake" kimarite, too, as are the okuri techniques. (Or at least hikiotoshi is frequently being used for that purpose, as it seems to be the go-to call for anything involving incidental contact that doesn't really fit any "proper" kimarite description from the yellow cluster.) Edit: On the subject of rare kimarite, I wonder if sokubiotoshi would find itself part of the yellow cluster if it was more prevalent. Edited November 27, 2019 by Asashosakari 1 Share this post Link to post Share on other sites
Randomitsuki 2,826 Posted November 27, 2019 1 hour ago, Asashosakari said: Edit: On the subject of rare kimarite, I wonder if sokubiotoshi would find itself part of the yellow cluster if it was more prevalent. That's true, I guess. I would not be able to distinguish sokubiotoshi from a hatakikomi, for sure. On another note, a disadvantage of the method behind this map is that it can only be the best possible approximation for the actual correlational patterns behind them (that's why I wrote "tend to be highly correlated" rather than "are highly correlated" in my initial post. Case in point: the actual correlation between sokubiotoshi and hatakiomi is at a fairly high +.29 on a scale from -1 = total dissimilarity to +1 = total similarity. In contrast, the correlation between sokubiotoshi and tsukitaoshi is non-existent at +.00. And yet, tsukitaoshi and sokubiotoshi are closer on the map than sokubiotoshi and hatakikomi. However, this is because sokubiotoshi and tsukitaoshi are similar in how often they do and do not co-occur with other kimarite. That's why the method is called multidimensional scaling, a near-impossible attempt to reduce something complex and multidimensional to something that our eyes can understand (a two-dimensional map). But it's like trying to play Beethoven's 9th when all you have at your disposal is a recorder (I've put a Wikipedia link in here as I just learnt that word minutes ago). Share this post Link to post Share on other sites
Yamanashi 3,840 Posted November 28, 2019 3 hours ago, Randomitsuki said: That's true, I guess. I would not be able to distinguish sokubiotoshi from a hatakikomi, for sure. On another note, a disadvantage of the method behind this map is that it can only be the best possible approximation for the actual correlational patterns behind them (that's why I wrote "tend to be highly correlated" rather than "are highly correlated" in my initial post. Case in point: the actual correlation between sokubiotoshi and hatakiomi is at a fairly high +.29 on a scale from -1 = total dissimilarity to +1 = total similarity. In contrast, the correlation between sokubiotoshi and tsukitaoshi is non-existent at +.00. And yet, tsukitaoshi and sokubiotoshi are closer on the map than sokubiotoshi and hatakikomi. However, this is because sokubiotoshi and tsukitaoshi are similar in how often they do and do not co-occur with other kimarite. That's why the method is called multidimensional scaling, a near-impossible attempt to reduce something complex and multidimensional to something that our eyes can understand (a two-dimensional map). But it's like trying to play Beethoven's 9th when all you have at your disposal is a recorder (I've put a Wikipedia link in here as I just learnt that word minutes ago). On that point, I witnessed a sokubiotoshi this basho in Makuuchi (was it by Tamawashi?) and I had to replay it several times before I saw the "chop" instead of a slap. Share this post Link to post Share on other sites
Jakusotsu 5,969 Posted November 28, 2019 ...and there may be much more sokubiotoshi if it wasn't for dubious hansoku calls. Share this post Link to post Share on other sites
Eikokurai 3,437 Posted November 28, 2019 (edited) 8 hours ago, Yamanashi said: On that point, I witnessed a sokubiotoshi this basho in Makuuchi (was it by Tamawashi?) and I had to replay it several times before I saw the "chop" instead of a slap. Azumaryu on day 13 when he was visiting from Juryo. I sort of knew what it was but couldn’t quite remember and then was like, “Oh, yeah, it’s that one.” Kind of underwhelming. It’s such a fine line between that and hatakikomi. I’m not even convinced it was much of a chop. Quite odd that the distinction is made there when you think there are other kimarite which seem to come with a whole range of interpretations. Edited November 28, 2019 by Eikokurai 1 Share this post Link to post Share on other sites
serge_gva 52 Posted November 28, 2019 Highly interesting statistic, thank you! It made me want to try to find a ranking of the use of a given kimarite by rikishi (in terms of percentage, not number of uses). With the database, I was able to find this percentage for each rikishi individually, but not to make a query that gives a ranking. Can someone help me? Share this post Link to post Share on other sites
Jakusotsu 5,969 Posted November 29, 2019 12 hours ago, serge_gva said: It made me want to try to find a ranking of the use of a given kimarite by rikishi (in terms of percentage, not number of uses). With the database, I was able to find this percentage for each rikishi individually, but not to make a query that gives a ranking. Can someone help me? Yubinhaad does this for each basho:http://www.sumoforum.net/forums/topic/39491-kimarite-statistics-2019-kyushu/ Share this post Link to post Share on other sites
Randomitsuki 2,826 Posted November 29, 2019 1 hour ago, Jakusotsu said: Yubinhaad does this for each basho:http://www.sumoforum.net/forums/topic/39491-kimarite-statistics-2019-kyushu/ I believe that serge_gva inquired about percentage per rikishi over his career. The data are on each rikishi‘s kimarite page, but there is no way to query these results. Only Doitsuyama could do it. 1 Share this post Link to post Share on other sites
Gurowake 4,053 Posted December 4, 2019 On 29/11/2019 at 04:29, Randomitsuki said: I believe that serge_gva inquired about percentage per rikishi over his career. The data are on each rikishi‘s kimarite page, but there is no way to query these results. Only Doitsuyama could do it. Someone sufficiently skilled in writing a webcrawling script could probably do it too by analyzing every rikishi bouts by kimarite page . That wouldn't be me, certainly. Share this post Link to post Share on other sites
Akinomaki 40,508 Posted December 4, 2019 6 hours ago, Gurowake said: writing a webcrawling script could probably do it too by analyzing every rikishi bouts by kimarite page Sumoreference is not only down pretty often, it is also quite resistant to large scale web-crawling attempts. When I tried to get e.g all banzuke of the past, usually after a dozen or so the database refused. Share this post Link to post Share on other sites
Jakusotsu 5,969 Posted December 4, 2019 1 minute ago, Akinomaki said: Sumoreference is not only down pretty often, it is also quite resistant to large scale web-crawling attempts. When I tried to get e.g all banzuke of the past, usually after a dozen or so the database refused. Perhaps such attempts are causing the downtimes? 1 Share this post Link to post Share on other sites
Akinomaki 40,508 Posted December 4, 2019 (edited) 5 minutes ago, Jakusotsu said: Perhaps such attempts are causing the downtimes? The database still worked in the browser every time I tried such an attempt (quite a while ago), so these are maybe different causes - but probably related somehow Edited December 4, 2019 by Akinomaki Share this post Link to post Share on other sites
Asashosakari 19,331 Posted December 4, 2019 2 hours ago, Gurowake said: Someone sufficiently skilled in writing a webcrawling script could probably do it too by analyzing every rikishi bouts by kimarite page . That wouldn't be me, certainly. The alternative would be to use spreadsheet magic to merge the data provided by something like this and this for as many kimarite as desired. Share this post Link to post Share on other sites
Tsuchinoninjin 1,269 Posted December 4, 2019 1 hour ago, Asashosakari said: The alternative would be to use spreadsheet magic to merge the data provided by something like this and this for as many kimarite as desired. My MATLAB mess can grab rikishi id and other relevant info from these html results page and keep them associated, I can try to have it ingest it later when I have time. What's a good output? Something like Kotoshogiku,yorikiri,240,oshidashi,20,etc,50 Just one rikishi per line? Share this post Link to post Share on other sites