5 steps to consider when picking a tool for your next data engineering project
A systematic approach for selecting data technologies and vendors when starting a new data engineering project.
Building systems is hard. Building future-proof, scalable data systems is even harder. At some point in the architecture process, you will have to choose between certain tools to achieve some goal of the system. These tools can be data warehouses, object storage, cloud providers, protocols, or even programming languages. In this article, we will go over a few best practices that can help guide you in this selection process.
The famous saying, "If all you have is a hammer, everything looks like a nail," is a great explanation of the dangerous mental trap we might fall into when we resort to tools we are familiar with to tackle new problems.
Let's look at 5 helpful thought exercises you can use when in a situation like this. Following this framework will help you through the process of finding, evaluating, and, eventually choosing a solution for a problem.
1. Understand the problem you are solving
Before you look at the tools available to solve your problem, make sure you deeply understand the problem you are trying to solve. Make sure you have clearly defined the technical and business requirements of your system. This will help you narrow down the tools that are available and can help you decide which ones are more suitable for the job.
While on the road to gaining domain knowledge, you will encounter various products, stakeholders, salespeople, and other folks in similar boots trying to navigate these waters. Reach out to them with your questions to accelerate this process.
One way you can get better at understanding the problem is to talk to as many stakeholders as you can and try to understand their business needs. This will help you identify the right tool for the job by understanding which ones meet their needs better.
2. Evaluate your existing tools and resources
Take a step back and list out all the tools and resources you already have at your disposal. Think carefully about all the pros and cons of each tool and how it can help you solve the new problem you are facing. Often some other part of an organization has faced a similar issue, and they already integrated a tool to solve it, this could speed up the decision process as you could ask that team to forward their architecture decisions records (ADRs) and save you a bunch of work!
It is important to keep in mind that just because you already have a tool that is being used in your stack, it does not mean you are obliged to use it to solve new challenges. If a different tool solves your problem more efficiently, you should consider integrating the new tool into your stack.
3. Choose the tool that meets your needs
Once you have a clear understanding of the problem and have evaluated your existing tools and resources, you can start narrowing down the list of possible tools. Make sure to choose a tool that meets your needs and offers the features and flexibility you need.
First and foremost look for tools that specialize in solving your actual challenge - there will always be tools that claim to solve a lot more than one thing, but this usually means that they provide sub-par solutions. Developers of specialized tools are guaranteed to have a clear focus on the problem their tools solves, which is the same problem you have.
While you are dissecting a product in this step, take the opportunity to gauge other, auxiliary features it provides - while this shouldn’t be a deciding factor, these qualities can become tiebreakers in the end.
4. Do your research
Now that you have narrowed down the possible tools, it is time to do some research. Read up on the features and limitations of the different tools and see which ones match your requirements. Talk to people who have used the tools and get their insights. This will help you make an informed decision. It is very important to be diligent in this step. If time allows, build a proof of concept project using each tool and test them rigorously. If you can, build reproducible benchmark suites that mirror your real production use case. This will help you quantify the differences between the tools, which allows easier comparison. Make sure to get in contact with the developers of each tool and ask them all the questions you possibly can - you don't want skeletons falling out of the closet after signing a multi-year contract. Try to read between the lines of the sales team and if possible, ask for a dedicated developer to answer your deep technical questions.
5. Look for scalability
Finally, look for scalability when evaluating your tools. Make sure the tool can scale with your data as your system grows. This will help you avoid future headaches and ensure your system can handle the increased demand. Evaluating this can be tricky - you not only have to look at the technical aspect but the business side as well if applicable. If you are choosing between vendors, make sure they can keep up with your demands if you suddenly have to scale 100x. Some organizations are fine with having small to medium clients, and that's what they are planning on supporting in the future as well; it's best to make sure your goals are compatible in this regard before joining forces.
Conclusion
Choosing the right tool for your data can be a tricky task. It's important to take the time to understand the problem you are trying to solve, evaluate your existing tools and resources, choose the tool that meets your needs, do your research, and look for scalability. Following these 5 best practices will help you make an informed decision when it comes to selecting the right tool for your data system.
Let's keep in touch
Subscribe and never miss another blog post, announcement, or community event. We hate spam and will never sell your contact information.