Two weeks in production. Learnings and failures.

Aka I was using a dev server in production

The other day I found out that my Docker container, which is served in Google Cloud Run, is randomly losing connection with my Firebase DB. This is weird because they should be working seamlessly without me need to do anything. Here’s how things work:
1. I commit my code in Github.
2. A Github action is running. Part of that action is the following code:

The important thing here is the line export_default_credentials: true which bundles the private keys into the container straight from the Google Cloud. This works. Until it doesn’t.
3. I deploy my container into GCloud Run.
4. Winner winner chicken dinner.

The problem is though that if your container is unused for some time Google suspends your container. Then when it gets a request it restarts it. I suspect that this is the point where my Firebase connection is being lost. The other suspicion is that the OAuth2 token that’s being created is expiring at some point and my container isn’t picking it up. If I restart my container everything works again. I still haven’t got to the bottom of this.

Reading Logs

As I was reading the logs to try and get to the bottom of this I observed another issue with my container/server situation.

I was running my app essentially in dev server mode. In my container I was doing python3 app.py and that was it. Clearly not a good idea. So I changed things and I used uWSGI to run my server. The caveat here though is that I also need nginx to run in that container too, which is a bit annoying because now my container size went from 390mb to 646mb. I will need to fine tune that but for now at least I have something that’s production worthy.

Next steps

I will be implementing an async function to check whether I still have connection with Firebase and if not then reconnect. Not sure how to do this yet but I’ll figure it out. Next when this is out of the way I want to start moving all the data in my Database and stop scraping websites. I plan to do that in a lazy way.

  1. Check my DB. Do I have the data for Box-office/Oscars for said year? If yes go to step 3. If not go to step 2.

  2. If I don’t have the data scrape the website again and then save it to Firebase. Go to step 3.

  3. Return the data from Firebase. Done.

After a few days I can delete the code for scraping completely.

Finally as my friend Ivan noticed there are some data quality issues:

The reason why this happened is because in 1997 there was a VHS re-release of the Godfather and it did register in the Box-office. My algorithm does start from the box-office list of movies for said year and it then moves from there. As I move the data in my DB I will need to reassess this. Also I might need to sanitize some data manually.

Thanks for reading.