A post after a long time. To be honest, a lot of us have been juggling our normal lives and the website work. Hence dedicating time to write a blog post is not always possible. Having said that, lets talk about today’s post.
So this post is all about the architecture of the site and how we have been able to maintain the site with zero financial costs w.r.t the servers et al. This post talks about the way the site operates from data entries to the final visualisation. Before I begin, it’s important to note that we have kept the site free of ads and corporate associations and have still managed to serve millions of site visits all thanks to free platforms like GitHub and Google Sheets. Without these freely available technologies, none of this would’ve been possible or sustainable in the long run.
Lets dive in then!
So here’s a diagram that can help you to understand the overall flow:
Before I start, a few naming conventions I’ll use -
Webserver - a server that serves a webpage to a client (browser, mobile et al).
Backend/application server - a server that can take requests on the fly and use business logic to give back data dynamically.
GitHub pages websites also come with some limitiations on the size of the site and things as such. These can be referred here: GitHub Pages.
Before I move further, the maps that we show on the website are adapted from Guneet’s GitHub repo. There have been some modifications made on the map based on the site requirements. Else, most of these are up to date with the latest district boundaries.
So now, lets talk of the backend server. This is where traditionally the business logic sits. Usually this might have a server like nodejs, apache et al which would take requests and then fetch data from a data source and give back the response. The logic in this layer is used to modify data (validity checks, joins, translations and other operations). This layer might also contain a data server which is usually a database (mysql, oracle, redis and many more). This is how a traditional backend would look. Just having a look at this system tells us that this would mean having a machine(s) where we could start our application server and optionally a database server. Once we enter this realm we get into load balancing, DR (Disaster recovery), HA (High Availability) and other technical requirements that has to be managed either by the owner of the manchine(s) or can be dealt with using AWS, Azure kind of services. Either way, it’s a heavy and possibly costly operation. Hence, not viable for a revenue free volunteer run initiative.
Now who populates these json files?
Enter GitHub actions! A detailed description of GitHub actions is out of scope of this article. So let me try to simplify it to our usecase and explain what we do.
Every 10 minutes, we invoke an action. This carries out the following set of operations:
- First we fetch data from a set of google sheets.
- Then we prepare json files which correspond to these sheets. Also some other json files that we were historically using in the website.
- Next we read these json files and prepare additional json files which are to be consumed by the website.
- Finally, all these json files get checked-in into the API GitHub pages (csv files follow the same approach).
The above mentioned set of activities are representative of what we do. The exact steps might differ a bit and can be checked here.
In case you are interested in knowing how we sync sheet to github, here’s the code.
The last, yet the most important layer - The humans
This is where majority of the work happens. Volunteers make entries into google sheets that gets synced using the above detailed out process. To aid the volunteers, we’ve developed some automation using Telegram bots. I’ll be soon writing a detailed post on Telegram bots we use. But in brief, these bots use bulletins fed by volunteers or sometimes predefined sources to read bulletins and then generate the data that needs to be added to the sheets. Here too, the request gets served using GitHub actions which invoke python scripts internally to do data operations.
Why we chose not to run ads
As detailed out above, we have zero costs incurred in terms of logistics for running the website. This indeed has come with its own limitations in terms of functionalities. But given the overall features that can be achieved using these free to use technologies, it makes little sense to invest and maintain our own servers or to team up with corporates to get free machines to run the website. This approach we’ve taken helps us to remain independent of any assocations - financial or otherwise. And more than anything, this is a testament to how not-so-difficult it is to keep things open and free if there’s a will to do so.
Until we write again,