I wanted to look back at all my Codewars solutions, and perhaps create some kind of portfolio frontend for it, so I had a look at the Codewars API. Unfortunately there was no endpoint to get the actual solution code, so I decided to try scrape it.
I had used BeautifulSoup in Python in the past, but didn't really like it. Maybe because I didn't read the documentation fully or because I didn't understand as much as I do now, but I decided I would try Colly, in Go, a self-described 'Fast and Elegant Scraping Framework'.
To my surprise, it was really not too difficult, I was able to find everything I wanted to do easily in the documentation and the examples were very helpful.
I really liked defining a structure for the shape of what I wanted.
type Kata struct { Kyu string `json:"kyu"` KataLink string `json:"kataLink"` KataTitle string `json:"kata"` LanguagesSolved []string `json:"languages"` Solutions []string `json:"solutions"` }
It made it real easy to extend the scraper bit by bit, and change the schema without getting confused.
I also really liked the functions a collector
has, such as OnHTML()
to match every HTML element by some jquery, rather than a looping over the elements, which is how I would have done it in Python. I also liked the functions to easily get the children elements.
Configuring the scraper was also much easier than with BeautifulSoup. I wanted to set a cookie and in Colly there's a function for it, whereas in BeautifulSoup I would have needed another library.
In the end, I had more issues with the structure of the HTML than scraping itself, but it was so easy I was able to modify the code to scrape a very different website with little code change.
I have built my Codewars solution scraper and you can see the source code here. I would still like to add more enhancements but that is separate to the scraping part.
I also built a Nestle website scraper for my side project, source code here. Again, I had more issues with the way the website was structured than scraping itself.
I would definitely recommend trying Colly for anyone looking to write any kind of crawler/scraper/spider.