Big Bang or Incremental, What’s the Best Way to Build a Data Lake?Heading

By Cary Moore, Senior Director of Data Science

No cloud-driven data lake yet? Don’t worry… Avoid the stress and follow this guideline.

In 2019, it still seems a bit odd to be discussing the merits of a data lake. However, many organizations have been slow to adopt the cloud, and thus data lake implementation remains elusive – a distant strategy – with benefits yet to be realized. Many of my customers still have decision paralysis as they sort through the hype, reality, pros, cons, vendors, technology options, architecture, security, privacy, business cases, increasingly precious funding, and dizzying array of other endless concerns. The common theme of anxiety seems to be “deploy ‘big bang’ or build incrementally over time?”

Like any good consultant, the answer is: “it depends” – of course – on the use cases; that is, what is the purpose of the data lake? What business needs will it address?

In my experience, while there are a couple of foundational steps, there is huge benefit to deploying a cloud-based data lake – it’s hard to make a mistake you can’t recover from quickly.  So why the anxiety? There are several basic decisions that guide your strategy.

1. With the fear of stating the obvious, you need to pick a cloud provider.
Though which provider to choose can sometimes cause folks some hesitancy, as the fear of picking the wrong one is often their biggest concern. Fear not! With a minimal investment, you can start with any one of the major cloud providers. The initial investment shouldn’t be lost because switching costs, such as moving the data, porting over procedural logic, and machine learning algorithms are relatively low. Account setup and configuration and acquiring storage and processing capacity is similar across all of the major vendors. And, the experience is useful whichever provider you end up choosing for your data lake. So even if an established cloud strategy does not exist and/or has yet to select a cloud vendor, that should not prevent you from taking your first steps.

2. Security is necessarily a major concern.
No one wants their company to be the next data breach headline on any major news channel. For any company, the risk of exposing sensitive data could be both a financial and reputational catastrophe. So, be certain and follow preferred practices. When setting up your security services, as part of your cloud strategy, use your data management, InfoSec, internal audit, and compliance policies to assess the cloud vendor for potential exposures.

3. When establishing storage services, review roles, responsibilities, and access controls.
Establishing regular and frequent reviews of access controls can prevent unintended unauthorized access grants. These types of data governance processes reduce the risk of unaudited use of key company information.

4. The next big hurdle is deciding how to build the data lake.
“Big bang?” or incrementally? The latter is the obvious answer. But how? Time-to-value is everything – and the value should be driven by the prioritized business use cases that the data lake is intended to solve. Today, agile is the preferred delivery method – assuring each use case is realized in small, well-defined increments – building upon the foundation one step at a time.

5. Leverage the plethora of machine learning (ML) capabilities every cloud vendor should provide.
These rich open source libraries are necessary to find “nuggets of gold” in the oceans of data.  Leveraging the right automation to your ML strategy can dramatically increase the delivery velocity and result throughput.

So, to break it down:

  1. Pick a cloud vendor; remember, anyone will do but it’s best to know your long-term strategy.
  2. Know and apply InfoSec privacy and security protocols.
  3. Define the access grants for authorized users.
  4. Define the business use cases which drive the prioritization of the incremental data lake sourcing strategy to deliver value early and often.
  5. Apply automated ML to discern hidden patterns and trends which can drive your company’s business strategic plans.

The biggest benefit of having a data lake is the opportunity to rapidly address data-driven business requirements with as little cost and as fast as possible. As the capability becomes more mature, more governance and structure can be formalized and implemented. Don’t delay, start your data lake today.

Recent Posts

Executive Perspective: Why Securing Your Data is the Key to Winning with AI

As CXOs, we’re all focused on leveraging AI to drive efficiency, innovation, and competitive advantage. The conversation often starts with infrastructure (GPUs, LLMs, and copilots), but let’s be clear: AI doesn’t run on GPUs. It runs on data. And if our data isn’t secure, nothing else matters. Unsecured and unclassified data undermines trust, exposes us to risk, and jeopardizes the very AI strategies we’re betting our futures on. It’s not about chasing the next shiny tool; it’s about building a foundation that allows AI to scale safely, responsibly, and securely.

Quantifying the Value of Data in Financial Services

In the financial services sector, data is a critical asset that drives profitability, risk management, regulatory compliance, and competitive edge. However, measuring its value remains challenging for many CFOs across sectors of the financial services industry regardless of organizational size or country of operations. CFOs rely on accurate data for forecasting, budgeting, and strategic planning. Quality data leads to better decision-making, optimized capital allocation, and swift responses to market changes. It is also vital for risk management, regulatory compliance (e.g., BCBS 239, Basel III, AML/KYC), and avoiding fines and reputational damage. “Fit for Business Use” data also supports customer retention, personalized services, and improved revenue stability. Data-savvy CFOs leverage insights for long-term growth.

AI Starts with Data: Go From Hype to Results

AI continues to dominate the conversation in business. From executive meetings to strategic roadmaps, AI is no longer just a trend but a real driver of transformation. The challenge is that while nearly every organization is talking about AI, very few are prepared to use it in a way that delivers measurable outcomes and lasting impact. The difference between hype and outcomes almost always comes down to two things: the quality of your data and your organization’s readiness to execute.

Exciting Updates from Informatica World: Paradigm Embraces the Future of Agentic AI

The digital landscape is evolving rapidly, and staying ahead means embracing the latest innovations in data management and artificial intelligence. At this year’s Informatica World, Paradigm is thrilled to share the groundbreaking advancements unveiled by Informatica, centered around their latest agentic AI solutions on the Intelligent Data Management Cloud (IDMC) platform.

Modernizing PowerCenter: The IDMC Way – Better, Faster, Cheaper

For many organizations, Informatica PowerCenter has been the workhorse of their data integration for years, even decades, reliably driving ETL processes and populating data warehouses that feed BI reports. However, this longevity often leads to a complex environment that can hinder agility and innovation.

Boost Growth with Data-Driven Hiring for Boutique Consultancies

Consistency is key to a boutique consultancy. Delivering quality services day in and day out, even as client demand fluctuates, relies heavily on having the right talent at the right time. Perhaps one of the largest operational challenges for small and mid-sized consulting firms, though, is matching recruitment cycles with cyclical demand. Without scalable, data-driven talent practices, consultancies can suffer from misaligned capacity, lost revenue streams, and stalled growth.

Strategies for a Successful Journey in Building the Dream Team

In the whirlwind world of project management, the success of a project often hinges on the strength and consistency of the team behind it. Imagine embarking on a journey where building a high-performing project team is not just about assembling a group of skilled individuals; it’s about fostering collaboration, trust, and a shared sense of purpose. Based on my personal experiences, let me take you through this journey with some strategies I use to help you build and lead a high-performing project team.

The Ultimate Guide to AI-Enhanced APIM Analytics for Enterprise Success

Enterprises increasingly rely on Application Programming Interface (API) Management (APIM) to streamline their operations, enhance customer experiences, and drive innovation. Azure API Management is a comprehensive solution enabling organizations to manage, secure, and optimize their APIs efficiently. However, beyond API management, APIM analytics – particularly when integrated with AI and data analytics – empowers senior executives with critical insights for real-time decision-making.

Why PMOs that Leverage Power BI are More Successful

Project Management Offices (PMOs) are increasingly turning to Microsoft Power BI as a game-changing tool to enhance their project management processes. By leveraging data visualization and analytics, PMOs can make informed decisions, streamline reporting, and improve overall project outcomes.