Creating a NodeJs based useful crawler application

When in need of some useful information which is not available in raw format (eg. CSV or JSON form) then there is quite a need to hack the data for it to be avilable in the format you can use.

This data happened to be all secondary schools national exams results in my country and it was archieved in some website I found and so the need to crawl the data from that website arose.

Tooling

Tokeo Crawler was using JSDOM to parse the DOM and Jquery for simple DOM nodes selection. The HTTP library used was Superagent and also I happened to use lodash for it's helper functions.

Functioning

Looking back this project was one of the most complex projects I set myself up to at the time and it's quite fair to look at it that was as I was copying data from the DOM elements of the page, for thousands of pages which I made the similar request to, then I was sorting the data into useful structure so I can save it in the MongoDB database using Mongoose ORM.

I applied a lot of functional programming functions and techniques that I acquired in my previous tutorial hell circles and it proved to be very useful in manipulating data in a less stressful sane way.

Summary

Writing Tokeo crawler was quite an adventure which took my NodeJs skills and confidence to a very high level. It was at that time when I realize the limitless possibilities of my abilities which motivated me to migrate from MongoDB to PostgreSQL DB and write better SQL automation to normalize the data into multiple databases using PLPgSQL language and then write a NodeJS API which used the creted database and later wrote a mobile application in React Native and published the app in PlayStore.

Link to Tokeo Crawler Repository