3 Things All Data Engineers Should Learn from Big Data

DataExpert
3 min readJan 6, 2022

Prior to my time in Silicon Valley, I’ve held roles as a Business Intelligence Engineer and Data Engineer in the healthcare and technology sector. I learned more from just 2 years of observing big data practices than all of my previous experience combined.

1. Compute is cheap compared to the time of Data Engineers

While I came from the land of query optimization, it never occurred to me how relatively inexpensive compute is. Yes, some companies use their own inexpensive products, but this lesson holds true for every company out there. If you consider the cost of writing highly-optimized queries to reduce compute time, and compare that cost to writing equivalent, but substantially less efficient queries, query optimization looks great. However, when factoring in the cost of the time of the Data Engineer writing the query, which can be days or weeks for larger projects, the cost is clearly better than time spent optimizing. The same lesson applies for storage; instead of spending days writing, optimizing, and debugging a ‘merge’ statement, why not just snapshot the data at every refresh and store the result? This can cause data to accumulate rapidly; perhaps 1TB per pipeline per year. The cost of this much storage? Pennies, and it is decreasing with time.

--

--

DataExpert

Data Engineering enthusiast, mentor, data geek, passionate about great technology and process