Imagine buying a trade ship in 18th Century. You probably look for a ship with a particular purpose in mind, some trade routes already in mind, delivering the cargo you imagine your clients want to buy. You would then make an order at the shipyard and invest your money to get the expected results. There are of course multiple shipyards that can deliver on this order. Some specialize in fast, agile ships, some prefer to build slower but robust units that can withstand harsh conditions.
After the ship is ready you are going to need a seasoned captain, ideally familiar with the route you are planning and aware of the risks that could be on the horizon. He should have the respect of his crewmates, be able to boost their morale in the time of need and have the authority to make any decision. Captain should know which risks to take, which to avoid and how to get to the destination on time.
The captain would then select his crew to get the job done, he would take his first mate, boatswain, other specialists that are necessary and then a healthy number of able-bodied seamen, ordinary seamen and apprentices to execute the work on the ship.
Once this is done, captain and his first mate would agree on how to get the ship to its destination taking the best possible route they can come up with. They would stockpile on provisions and set sail with the crew to their destination.
This is of course greatly simplified view of what would happen, not including pirates or politics for example (whichever you feel is higher risk here).
Now imagine that instead of a ship you are starting a Data Warehouse project. There are many similarities – the data (your priceless cargo) needs to be transported (shipped) from your source systems (the suppliers) to your target Business Intelligence or Artificial Intelligence (the customers/destinations). You need to build the Data Warehouse (Ship) by selecting from different DWH vendors (Shipyards) depending on your needs.
And yet there are some common pitfalls where our metaphoric “ship” takes an unexpected wrong turn.
I observed a lot of Data Warehouse projects that sole purpose was to build a Data Warehouse (or Data Lake) and put all the available data in that creation. The part of using that data was to be set later once the Warehouse is complete
And yet a warehouse without a purpose is worthless, much like a Ship bought without destination & cargo in mind. No company would commission a ship to stand idly in the port and no sane captain would take “we will figure it out later” as a destination.
Only by clearly setting the “destination”, can a Warehouse project be successful. Be it analytical reporting, real-time analytics, generative artificial intelligence, machine learning algorithms – the purpose shapes the architecture much like the route and the cargo shapes the ship design.
This does not mean that the original purpose never changes, it is very possible that the market situation would shift and either the cargo would be different or its destination. To combat this and to expand sometimes you return to shipyard and make adjustments or commission another ship with different purposes in mind. This will lead you to creating a trade fleet of your own; in Data Warehouse world this would be expanding from single Data Warehouse into a Data Mesh.
Data mesh is an approach to building a decentralized data architecture by leveraging a domain-oriented, self-serve design. In English this means building a data architecture that has multiple data solutions by different domains (e.g. Sales, Marketing, Finance) usually also utilized for different needs (e.g. Reporting, AI). Some platforms like Snowflake are particularly good at this, keeping the storage (Database) and compute part (Virtual Warehouse) separate so to fit different needs.
This way your trade fleet retains flexibility and is cost-effective as ships are designed to fit their purposes, without overspending on unnecessary features while retaining a degree of flexibility. If you would like to learn more about different approaches to Data Warehouse, please visit https://www.phronesispath.com/service/data-warehouse/
Imagine a ship where there is a Captain, First Mate and a dozen of able-bodied seamen but no less-experienced ones. The ship would probably get to its destination but would it be the most effective way? Veteran crew will get the job done, they will also be bored to death or even angry that they also need to scrub the floor since there is no one else there to do it.
Not to mention the cost of their wages would heavily cut into the profitability of the ship. Eventually despite the good pay, those veterans would find a more “suitable” positions on a different vessel (and different trading company as well).
From data engineering perspective this is equivalent to employing only senior or architect level Data Engineers. If you make an architect write day-to-day SQL statements, you will hear their curses even from your captain quarters. Not to mention how much of a waste of money it is to make this data-guru perform the easy tasks. This is why ships had different crewmen to complement the “senior” staff – ordinary seamen, apprentices and so on.
This is exactly how a healthy team looks like – there is a Project Manager or Product Owner (Captain), Architect or Lead Engineer (First Mate), Senior Engineers (able-bodied seamen) and a bunch of engineers / junior engineers (ordinary seamen, apprentices). This way your team works efficiently and the senior staff can focus on the challenges and innovation (as opposed to scrubbing the floor).
But companies dread this in fear of senior staff being occupied teaching and correcting mistakes of the less-experienced, leading to loss of time and effort. This is where Data Warehouse Automation tools come in (no equivalent in 18th Century sea world but they would definitely appreciate one!).
Tools like Coalesce allow for creating models out of pre-defined templates in a graphical user interface. Those templates are built by specialists in the engineering world and can create models in Data Vault 2.0, Kimball and other modelling techniques as well as connecting to AI engines (like Snowflake Cortex).
Using this tool is much easier for the engineers, the models are much less prone to error (because of the templates), documentation and data lineage for your team is automatically generated and your best practices are always used without constant reminders. The senior engineers can modify these templates if they wish to do and can focus on the heavy-lifting they were put on this earth for.
Furthermore you can employ more business-oriented IT members, prioritizing business knowledge over technical knowledge (within reason). If you would like to learn more about Coalesce, please visit us at https://www.phronesispath.com/service/coalesce/.
Also do not be tempted to create a crew of engineers only from the unexperienced ones. You will be lucky if your ship leaves the harbor but then definitely it will sink on the open sea.
Can you imagine a ship with multiple captains equal in strength of their voice? This was unimaginable for a ship crew (and still is). When at open seas, captain’s word was equal to that of God himself. Of course even captains had their crewmates to discuss solutions in the form of First, Second and Third Mate (sometimes more). But the captain word on the course of action was definite.
Nowadays companies fall into the trap of democratization, mistaking agile techniques for seeking compromise at all costs. In reality discussion is welcome always but then one person (the Captain) needs to make the decision and it is final. Consensus is rarely achieved in real world and seeking consensus will lead the ship to murky waters. It should be made clear that the voices will be heard but the Project Manager has final word on the course of the Data Warehouse project.
Of course you need to have a competent sailor in the captain seat – with a weak captain, it doesn’t matter how good the crew is.
Sometimes the problem is different, there is one “official” captain but there are also those in high seats in the trade company that influence the crew directly with their own agenda or have the power to overrule the captain. This is an equivalent of a “powerful” stakeholder descending from heavens to impose his commandments on the data warehouse project.
You can imagine this would be unthinkable on a ship at open seas. Project Manager (Captain) is in the best position to take decisions as he is very close to the Data Warehouse project (sailing ship), he has his trusted senior advisors on board (his Mates). No one from the distant land will make a better decision, especially if they never commanded a ship. Anyone else giving orders will just cause confusion and the morale of the team will suffer.
If you select a Project Leader, Project Manager or Product Owner for Data Warehouse project, you need to empower them to be the “true” captain that you trust and not just a figurehead. And that person needs to be able (and have mandate to) make the final call. If you would like to learn more how to create an environment where this is possible, please visit us at https://www.phronesispath.com/service/data-strategy/.
When you consider starting a Data Warehouse project It’s not only about the money and technology but also about the processes, the team you have or can onboard and your organization culture on empowering your Captains. Keep in mind that a Data Warehouse is a means to an end and not the end itself (much like ships make no sense without destinations). Reach out to us if you would like us to help you design the perfect ship :).
At Phronesis Path we wish you Safe Travels and Calm Waters!
Ready to take the first step towards unlocking opportunities, realizing goals, and embracing innovation? We're here and eager to connect.