Sign in to follow this  
Jomangor

Daily wins scraping

Recommended Posts

Any programmers on the forums? I'm trying to scrape daily win-loss information for top division bouts, but my two main sources (sumo.or.jp and sumodb) use images to record wins and losses. Is there anywhere I can find the results as ones and zeroes or text (w, l, etc)? Or is there a way for me to scrape image info into google sheets?

I'd be glad for any advice!

Share this post


Link to post
Share on other sites

There's a text-only page available for results.  See http://sumodb.sumogames.de/Results_text.aspx.  There's also simple style: http://sumodb.sumogames.de/Results.aspx?b=202101&d=15&simple=on.

I've used the latter in scraping the results with Excel.  I use it because it has links to each rikishi page (which Excel imports along with the text), and I keep track of rikishi by the number in the URL of their rikishi page, seeing as shikona get reused all the time.  Certainly you could use the former and use the basho and rank to determine which of the many rikishi that might have used the shikona it might be, but with the latter it's easy to just extract the URL from the link.  It's more a bit more work flipping half the matches around so the winners are all in the same column like with the former, but either way it's going to take some coding and compute time.

Edited by Gurowake
  • Thanks 1

Share this post


Link to post
Share on other sites

I've written a python app that scrapes daily results from the NSK site (sumo.or.jp) and uploads them to a MySQL DB. Happy to share the code if you'd like.

Edited by Houmanumi
  • Thanks 1

Share this post


Link to post
Share on other sites
3 hours ago, Houmanumi said:

I've written a python app that scrapes daily results from the NSK site (sumo.or.jp) and uploads them to a MySQL DB. Happy to share the code if you'd like.

Python is a few steps beyond my current ability, but I thank you for the offer!

Share this post


Link to post
Share on other sites
12 hours ago, Gurowake said:

There's a text-only page available for results.  See http://sumodb.sumogames.de/Results_text.aspx.  There's also simple style: http://sumodb.sumogames.de/Results.aspx?b=202101&d=15&simple=on.

I've used the latter in scraping the results with Excel.  I use it because it has links to each rikishi page (which Excel imports along with the text), and I keep track of rikishi by the number in the URL of their rikishi page, seeing as shikona get reused all the time.  Certainly you could use the former and use the basho and rank to determine which of the many rikishi that might have used the shikona it might be, but with the latter it's easy to just extract the URL from the link.  It's more a bit more work flipping half the matches around so the winners are all in the same column like with the former, but either way it's going to take some coding and compute time.

The simple style might be just what I'm looking for! I'm using google sheets, but the scraping protocol is similar enough to excel. Up until now, I've used regular expressions to find what I want. How are you doing it so that it also scrapes the Rikishi number?

Share this post


Link to post
Share on other sites
18 hours ago, Jomangor said:

The simple style might be just what I'm looking for! I'm using google sheets, but the scraping protocol is similar enough to excel. Up until now, I've used regular expressions to find what I want. How are you doing it so that it also scrapes the Rikishi number?

The link is part of what imports into Excel from the simple style scrape.  I use the VBA call Cells(R,C).Hyperlinks(1).Address to extract the (first, which it what the 1 is for I think) Hyperlink from cell with row R and column C, then take the characters at the end of the extracted hyperlink after some number that corresponds to the length of the path before the rikishi number and copy those characters onto the sheet to serve as the identifier for the rikishi instead of the hyperlinked shikona.  If the link doesn't import into Google Sheets, or you don't have access to the Cells.Hyperlink.Address function call, then I can't help you.  I stopped development on this project years ago and don't even keep the database up to date as much any more since I stopped using the engine that required it.

Edited by Gurowake

Share this post


Link to post
Share on other sites
3 hours ago, Gurowake said:

The link is part of what imports into Excel from the simple style scrape.  I use the VBA call Cells(R,C).Hyperlinks(1).Address to extract the (first, which it what the 1 is for I think) Hyperlink from cell with row R and column C, then take the characters at the end of the extracted hyperlink after some number that corresponds to the length of the path before the rikishi number and copy those characters onto the sheet to serve as the identifier for the rikishi instead of the hyperlinked shikona.  If the link doesn't import into Google Sheets, or you don't have access to the Cells.Hyperlink.Address function call, then I can't help you.  I stopped development on this project years ago and don't even keep the database up to date as much any more since I stopped using the engine that required it.

Google sheets doesn't support VBA, but does support JavaScript, so I guess I'll have to learn Javascript. :-S

Thanks again to both of you for the help!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this