Five Stages of Task Manager Grief
Techopedia defines a task manager as “… a utility that provides a view of active processes or tasks, as well as related information, and may also allow users to enter commands that will manipulate those tasks in various ways”.
Congrats, you built a data set
Most likely you build a cron job that calls a bash, Python or R script to dump data into a relational table. You stitched together some internal data with data from an API and now there is a new table with great insights. A few people in marketing or analytics hear about the new insights, which help drive a strategic decision and your boss is happy. Pop the champagne.
Stage 1 — Denial: I don’t need a task manager
A senior engineer or close advisor tells you to implement a task manager like Airflow or Amazon’s Simple Workflow Service. After looking at the complex documentation, you shrug them off. Those tools are too complicated and its unnecessary. You’ll be fine.
Stage 2 — Anger: I can’t believe my job keeps failing
You’re going about your day when you get a call from analytics. The data isn’t there. No problem, open up the server and diagnose the problem. Seems like the API changed. Okay, reconfigure the job a little and set it back in motion. Back to work.
Again, you get a call, the data isn’t there. Okay, let’s diagnose again. Seems like one of the columns you joined against in an external table has changed up a bit. Diagnose, fix, move on. Again.
The job fails a third time, and you decide to be smart and implement a test that shoots an email if there’s no data. How clever you are.
Stage 3 — Bargaining: You don’t need to scale out the business on my dataset, its not necessary
Marketing is wondering if that data can be linked to another data set. Yup, it can. This means you’re job needs to wait on another job to happen before it can run. Okay, you set up the email system to check the other job, and set up you’re job to run after. Congrats, you’re now responsible for two data sets.
The job fails again. While testing it over and over again, you abused the API a bit too much, and it cuts you off. The company can’t get access to the data until the API resumes tomorrow. Fantastic.
Just getting notified isn’t enough; you implement a logger. Now the next time the job fails you’ll be able to check to see why. Should cut back on debugging time.
Now analytics is calling you when either job fails. It’s not like you have other work to do. The data is now necessary for certain functions of the company and its reliability and availability are critical for the business. You try to push back, but it’s too late, the company is more successful because of your work; can’t go back now.
Stage 4 — Depression: I don’t want to be responsible for this data set anymore
Maybe there is another engineer who can take this off your hands. Maybe an external service can manage this. Engineering shrugs, they’re overworked while simultaneously playing Call of Duty.
Over the weekend, the job stopped; no data came in on Saturday or Sunday. You have to run the job for two days and make sure you didn’t duplicate data. Great way to start a Monday. Maybe you should have listened to your adviser who told you to build a task manager.
The job fails over a weekend. This time the data set grew too much and a single server couldn’t handle it. Time to do something drastic, as this is cutting in to your sleep.
Stage 5 — Acceptance: Okay, I’ll build a task manager
Turns out that others have faced your same problems before: Depending on other tasks (precedence), recording run time information (logging), emailing or texting systems (notifications), repopulating tables (back filling), data scale (parallelism) and a dashboard to look at the tasks (system visibility) are regular problems for data engineers, unfortunately you had to learn this lesson the hard way.
Caserta has had great success implementing Airflow as a data ingestion task as it provides the foundation for system management and data quality. I encountered these problems while at Rocket Fuel and implementing a task manager solved this issues effectively and efficiently.