Discovery Bundle — Data Management Fundamentals

Module 7 · Section 1

Infrastructure is not the goal — it is the delivery system

This section explains what infrastructure is supposed to do for AI work: make data usable, models deployable, and decisions repeatable.

Infrastructure should not be designed to impress. It should be designed to reduce friction, support governance, and make the right work easier to repeat.

For most teams, the real question is not “What is the most advanced architecture?” It is: What is the simplest architecture that supports our current AI goals safely?

Access

→

Scalability

→

Governance

→

Repeatability

→

Optionality

Deep Dive: What “good infrastructure” looks like for startups and SMEs

🏗️ Example — over-built vs. right-sized starter infra

✗ Over-built

GPU cluster + Kubernetes + multi-region for a prototype with 5 internal users. → Months of setup, high cost, mostly idle — and complexity you now have to operate.

✓ Right-sized

Managed model API + one small VM + object storage. Add monitoring & backups. → Live in days. Move to dedicated GPUs or HPC once demand is actually proven.

Infrastructure should follow validated demand — match the investment to the stage: discovery first, scale when the value is real.

Infrastructure is often misunderstood. When people hear the word, they may think of servers, GPUs, cloud accounts, data centres, supercomputers, storage systems, Kubernetes, networks, or dashboards. These things matter, but they are not the real goal.

The goal of infrastructure is to make useful work possible, repeatable, secure, and scalable. For AI, infrastructure is the delivery system that connects data, models, people, workflows, and decisions. It determines whether a team can move from experiment to pilot, from pilot to product, and from product to reliable operation.

For startups and SMEs, this distinction is important. You do not need impressive infrastructure for its own sake. You need infrastructure that matches the maturity of your AI use case. Too little infrastructure creates fragile prototypes that nobody can maintain. Too much infrastructure creates cost, complexity, and delay before the value is proven.

Infrastructure should not be designed to impress. It should reduce friction, support governance, make work repeatable, and give teams the right amount of compute, data access, and control for their current AI goals.

1. Infrastructure is the system behind the system

An AI application may appear simple to the user: type a question, upload a document, click a button, receive an answer. But behind that interaction, many components may be involved: data storage, document retrieval, model APIs, GPU servers, authentication, logging, monitoring, evaluation, deployment pipelines, and human review workflows.

Infrastructure is the foundation that lets those components work together. It answers practical questions:

Where does the data live?
Who is allowed to access it?
Where does the model run?
How are experiments tracked?
How are models, prompts, and datasets versioned?
How does the system move from development to production?
How are costs, errors, latency, and usage monitored?
How can the system be changed without starting from zero?

These questions may sound technical, but they directly affect business outcomes. If data cannot be accessed, AI projects stall. If experiments cannot be reproduced, decisions become guesswork. If deployment is manual, every update is risky. If monitoring is missing, failures become visible only after users lose trust.

Good infrastructure makes the right behaviour easy. Bad infrastructure forces every team to invent its own workaround.

2. The five jobs of AI infrastructure

For this module, it is useful to think about infrastructure through five jobs: access, scalability, governance, repeatability, and optionality.

Access

Access means that teams can reach the data, tools, environments, and compute they need without manual heroics. If every AI experiment requires someone to copy files manually, request credentials informally, or wait weeks for a machine, the organization will move slowly.

Good access does not mean everyone can see everything. It means the right people can access the right resources with the right permissions.

Scalability

Scalability means the system can grow when the workload grows. This may involve more users, more documents, larger datasets, bigger models, more inference requests, or more training jobs.

Scalability does not always mean “massive scale”. For an SME, it may simply mean that the system still works when five users become fifty, when a prototype becomes a pilot, or when a monthly batch job becomes a daily workflow.

Governance

Governance means that the organization can answer who owns data, who accessed it, where it is processed, how it is protected, and what happens when something changes.

In AI systems, governance includes data access, model access, prompt ownership, logging, evaluation records, security controls, and compliance requirements.

Repeatability

Repeatability means that experiments, deployments, and workflows can be run again without rebuilding everything from scratch. This is essential for debugging, improvement, audits, handover, and scaling.

If a model worked once in a notebook but nobody can reproduce the result, the organization has a demo, not a reliable capability.

Optionality

Optionality means the organization avoids unnecessary lock-in. This does not mean avoiding all vendors or platforms. It means keeping enough flexibility to change model providers, hosting locations, architectures, or tools when business needs change.

Optionality is valuable because AI technology changes quickly. Today’s best model, platform, or deployment pattern may not be tomorrow’s best choice.

3. Infrastructure should fit the stage

The right infrastructure depends on the stage of the AI initiative. A small experiment, an internal pilot, a customer-facing product, and a regulated production workflow do not need the same infrastructure.

For an early experiment, the team may only need:

a notebook or simple development environment,
sample data,
access to a model API or local model,
basic prompt and result tracking,
a small evaluation set.

For a controlled pilot, the needs grow:

role-based access,
better data management,
versioned prompts and datasets,
repeatable deployment,
monitoring of usage, failures, cost, and latency,
clear ownership.

For production, infrastructure must support:

authentication and authorization,
reliable hosting,
backups and recovery,
logging and audit trails,
security review,
incident response,
rollback procedures,
ongoing evaluation and monitoring.

The mistake is to jump directly to production-grade complexity before the use case is validated. The opposite mistake is to keep using a fragile prototype after the organization has started relying on it.

Practical rule: Infrastructure should mature with the use case. Start simple, but do not let a successful prototype become business-critical without upgrading the foundation around it.

4. Storage and compute: the visible foundation

Storage and compute are the most visible parts of infrastructure.

Storage is where data, documents, model artefacts, logs, embeddings, evaluation sets, and outputs live. This could be a cloud bucket, database, data warehouse, data lake, file system, document repository, or HPC filesystem.

Compute is where workloads run. This could mean a laptop, a cloud VM, a server, a GPU instance, a Kubernetes cluster, a managed AI service, an edge device, or a supercomputer.

Different AI workloads have different compute needs:

prompting and simple API use may need little local compute,
RAG indexing may need storage, embedding generation, and search infrastructure,
batch document processing may need scalable but temporary compute,
model inference may need predictable latency and cost,
fine-tuning may need GPU capacity,
large-scale training may require distributed GPU systems and specialised storage.

For decision makers, the key is not to buy the largest machine. The key is to understand the workload. A chatbot prototype, a document extraction workflow, a demand forecasting model, a RAG assistant, and a large model fine-tuning run all stress infrastructure in different ways.

5. Resource management: making shared compute usable

Once more than one person or workflow uses compute resources, resource management becomes important. Without it, teams compete informally for machines, GPUs, memory, and storage.

Resource management answers:

Who can run jobs?
Which resources can they request?
How are jobs queued or scheduled?
How are priorities handled?
How are costs or quotas tracked?
How are failed jobs diagnosed?

In cloud environments, this may involve managed services, autoscaling, quotas, and budget limits. In HPC environments, this often involves job schedulers, partitions, queues, modules, filesystems, and batch scripts. In both cases, the goal is the same: make expensive shared resources usable without chaos.

For SMEs using shared European HPC resources, this can be a new way of working. Instead of starting a GPU machine instantly and keeping it running, users submit jobs to a scheduler. That may feel unfamiliar, but it is what allows many users to share very powerful systems fairly and efficiently.

6. Development environments: where AI work actually happens

The development environment is where people write code, run notebooks, test prompts, debug pipelines, and explore data. It is often the part of infrastructure that users feel most directly.

A good development environment should make common work easy:

installing and managing dependencies,
running notebooks or scripts,
accessing approved data,
testing prompts and RAG pipelines,
tracking experiments,
using GPUs when needed,
moving from prototype to repeatable workflow.

Many AI projects struggle because the development environment is informal. One person’s laptop has the right packages. Another person cannot reproduce the setup. A notebook works once, but the environment changes. A model file sits in someone’s home directory. API keys are copied into code.

Good infrastructure reduces this friction. It provides shared environment definitions, version control, secrets management, documentation, and repeatable setup steps.

7. Model and application infrastructure

AI infrastructure is not only about training models. Many modern AI systems use existing foundation models through APIs, self-hosted inference, or hybrid patterns.

The application infrastructure may include:

model endpoints,
prompt templates,
RAG pipelines,
embedding generation,
vector databases or search indexes,
tool connectors,
guardrails,
output validation,
evaluation datasets,
monitoring dashboards.

This matters because the model is only one component. A weak retrieval pipeline can make a strong model look bad. Poor logging can make failures impossible to diagnose. Missing output validation can break downstream workflows. Unclear prompt ownership can make behaviour drift over time.

Good AI infrastructure therefore supports the whole application lifecycle, not only the model lifecycle.

8. Cloud, on-prem, edge, and HPC are not enemies

Infrastructure choices are often presented as opposites: cloud versus on-premises, managed versus self-hosted, local versus remote, HPC versus cloud. In practice, many organizations use combinations.

A team might use:

a managed cloud model API for early prototypes,
cloud storage for documents and application data,
an on-premises system for sensitive internal data,
edge devices for low-latency local inference,
HPC or EuroHPC systems for large training, fine-tuning, simulation, or batch processing,
a custom application that connects several of these environments.

The key is to be intentional. Hybrid architectures can solve real problems: data residency, specialised hardware, legacy systems, burst capacity, cost control, or access to advanced compute. But hybrid setups also add complexity: governance, identity management, data movement, monitoring, skills, and operational responsibility.

For this reason, hybrid should not be chosen because it sounds sophisticated. It should be chosen because the use case needs it.

9. HPC and EuroHPC: shared advanced compute as an option

High-performance computing can be valuable when the workload is too large, too expensive, or too specialised for ordinary local machines. For AI, this may include fine-tuning large models, running large experiments, processing many documents or images, training vision models, running simulations, or benchmarking scalable workflows.

European initiatives such as EuroHPC and AI Factories are important because they make advanced compute more accessible to companies that would not normally own such systems. For startups and SMEs, this can reduce the barrier to experimenting with serious GPU capacity.

However, HPC is not the same experience as a simple SaaS tool. Users often need to learn:

accounts and authentication,
login nodes and compute nodes,
job schedulers,
filesystems,
software modules,
containers or environments,
data transfer,
batch jobs and queues.

This learning curve is manageable, especially with onboarding support. But it means HPC should be used for workloads where the compute advantage justifies the operational setup.

A practical rule is: use HPC when you need substantial parallel compute, specialised GPUs, large-scale experiments, or access to publicly supported advanced infrastructure. Do not use HPC merely because it sounds powerful if the workload can be solved more simply.

10. Performance engineering: infrastructure is about bottlenecks

AI systems often fail to use infrastructure efficiently. A team may rent a powerful GPU and still get poor performance because data loading is slow, memory is insufficient, networking is a bottleneck, storage is overloaded, or the model is not configured efficiently.

Performance engineering asks where the real bottleneck is:

Is the workload limited by compute?
Is it limited by memory capacity?
Is it limited by memory bandwidth?
Is data loading too slow?
Is network communication slowing distributed training?
Is storage I/O limiting throughput?
Is the model too large for the latency or cost target?
Is the system underusing expensive GPUs?

For decision makers, the important insight is that more hardware is not always the answer. Sometimes the right fix is better batching, caching, data layout, model quantization, smaller models, better storage, optimized inference, or a simpler architecture.

Infrastructure decisions should therefore be informed by measurement, not assumptions.

11. Cost: infrastructure creates both capability and obligation

Infrastructure creates capability, but it also creates ongoing cost. Cloud services can scale quickly, but costs can grow with usage. Self-hosted systems may offer more control, but require maintenance, staff, electricity, upgrades, security, and utilization management. HPC access can provide powerful resources, but teams must use allocations efficiently.

AI cost is often shaped by:

model size,
number of users,
number of model calls per workflow,
input and output token volume,
embedding and indexing frequency,
storage volume,
GPU utilization,
data transfer,
monitoring and logging retention,
engineering and operations time.

A prototype may look cheap because only a few people use it. A production system may look different when hundreds of users, larger documents, more requests, and stricter monitoring are included.

Good infrastructure therefore includes cost visibility. Teams should know what each workflow costs, not only whether it works.

12. Governance and security are infrastructure concerns

AI governance is not only a policy document. It must be implemented in infrastructure: identity management, access control, logging, data retention, model permissions, encryption, audit trails, approval workflows, and monitoring.

For example:

A RAG assistant should retrieve only documents the user may access.
A model endpoint should not be callable by everyone without limits.
API keys should not be stored inside notebooks or scripts.
Logs should not expose unnecessary sensitive prompts or outputs.
Production deployments should have rollback paths.
High-impact actions should require review or approval.

If governance is not supported by infrastructure, it depends on people remembering rules manually. That rarely scales.

13. Build versus buy: avoid infrastructure vanity projects

A recurring infrastructure decision is whether to build, buy, rent, or share. For AI, this question appears everywhere: data platforms, vector databases, model hosting, orchestration, monitoring, GPU infrastructure, and evaluation tooling.

Building gives control, but creates responsibility. Buying gives speed, but creates dependency. Renting gives flexibility, but may increase long-term cost. Shared infrastructure, such as EuroHPC access, can provide powerful capability, but may require learning different workflows.

The best choice depends on:

how strategic the capability is,
how sensitive the data is,
how much customization is needed,
how much internal talent is available,
how predictable the workload is,
how important cost control is,
how much operational responsibility the organization can handle.

Avoid infrastructure vanity projects. Do not build a complex platform because large tech companies have one. Build or adopt what your current and near-future AI work actually needs.

14. What “good enough” infrastructure looks like

For a startup or SME, good enough infrastructure is not minimal, and it is not enterprise overkill. It is the smallest foundation that supports safe progress.

For early AI adoption, that usually means:

clear data locations and owners,
approved tools for model access,
basic rules for sensitive data,
shared development environments or setup instructions,
version control for code and important prompts,
a way to store and update documents for RAG,
basic logging and cost tracking,
human review for risky outputs,
documented handover for prototypes.

For more mature AI initiatives, add:

repeatable deployment pipelines,
role-based access control,
monitoring and alerting,
evaluation pipelines,
model and prompt registries,
secure secret management,
incident response,
infrastructure-as-code or documented provisioning,
clear ownership for operations.

The right level depends on risk. Internal brainstorming needs little. Customer-facing, regulated, or business-critical AI needs much more.

15. What this means for decision makers

Decision makers do not need to understand every infrastructure detail. But they do need to ask the right questions before approving AI projects.

Useful questions include:

What stage is this AI initiative in: experiment, pilot, or production?
What data does it need, and who owns that data?
Where will the model run?
Is the workload small, spiky, latency-sensitive, or compute-heavy?
Do we need cloud, self-hosting, HPC, edge, or a combination?
Who can access the system and its data?
How will we reproduce experiments or deployments?
How will we monitor cost, latency, quality, and failures?
Who maintains the system after the first demo?
What would force us to change architecture later?

These questions turn infrastructure from a technical afterthought into a strategic enabler. The aim is not to predict every future requirement. The aim is to avoid obvious traps: inaccessible data, untracked experiments, uncontrolled costs, fragile prototypes, security gaps, and architectures that cannot evolve.

Bottom line: Infrastructure is not the goal of AI adoption. It is the delivery system that makes AI work usable, repeatable, governable, scalable, and changeable. For startups and SMEs, the best infrastructure is not the most complex one. It is the simplest foundation that supports the next safe and valuable step.

🧩 Task

Interactive task: Pick the infrastructure objective that matters most for your organization right now. You’ll get a recommended focus.

Module 7 · Section 2

Warehouse vs lake vs lakehouse vs mesh

This section helps you understand the role of common data architectures and when each one is actually useful.

Deep Dive: Choosing a data architecture without overcomplicating it

When organizations talk about AI infrastructure, they often focus on models, GPUs, cloud accounts, or APIs. But before a model can be useful, the organization needs somewhere to store, prepare, govern, and access data. That is where data architecture comes in.

Terms such as data warehouse, data lake, data lakehouse, and data mesh are often used as if they were interchangeable. They are not. They solve different problems. Choosing between them is not mainly about which name sounds most modern. It is about what kind of data you have, who uses it, how much structure you need, how quickly the data changes, and how much governance is required.

For startups and SMEs, the most important lesson is simple: do not copy enterprise architecture patterns before you have enterprise problems. A small organization may not need a full data lakehouse platform or a formal data mesh. But it does need to understand what these patterns are trying to solve, because the same problems appear in smaller forms: scattered data, unclear ownership, duplicated files, inconsistent reports, poor access control, and AI systems built on unreliable sources.

Data architecture should make data easier to use, trust, govern, and reuse. If an architecture creates more confusion than clarity, it is not the right architecture yet.

1. Why this matters for AI

AI systems depend on data in several ways. A predictive model needs training and evaluation data. A RAG system needs documents and metadata. An AI assistant needs context. A monitoring system needs logs and feedback. A dashboard needs clean and consistent metrics. A fine-tuning project needs high-quality examples.

If the data architecture is weak, the AI system inherits that weakness. Common symptoms include:

teams cannot find the data they need,
the same metric has different definitions in different departments,
documents are duplicated or outdated,
data pipelines are manual and fragile,
access rights are unclear,
reports cannot be reproduced,
AI systems retrieve the wrong documents,
models are trained on data that nobody can reconstruct.

Good architecture does not guarantee good AI, but bad architecture makes reliable AI much harder. The question is therefore not “Which architecture is fashionable?” The question is: what architecture makes our data usable, trustworthy, and maintainable for the use cases we actually have?

2. Data warehouse: structured, trusted, and optimized for analytics

A data warehouse is optimized for structured analytics and reporting. It is usually designed around business questions: revenue, customers, products, transactions, inventory, churn, marketing performance, finance, operations, and other well-defined metrics.

In a warehouse pattern, data is typically cleaned, transformed, modeled, and organized before most users query it. This is often called schema-on-write: the structure is defined when data is loaded or prepared.

Warehouses are strong when:

data is mostly structured,
business definitions are stable,
many users need consistent reports,
performance for SQL queries matters,
governance and access control are important,
the organization needs one version of core business metrics.

Typical warehouse use cases include:

business intelligence dashboards,
financial reporting,
sales analytics,
customer segmentation,
inventory reporting,
management KPIs,
clean training tables for predictive models.

For AI, the warehouse can be a strong source of structured, curated data. If you want to train a churn model, forecast demand, evaluate sales performance, or create reliable features from business data, a warehouse-style approach is often useful.

The limitation is flexibility. Warehouses are usually less natural for raw, messy, semi-structured, or unstructured data such as PDFs, images, audio files, clickstreams, log files, chat transcripts, and large document collections. They are also less suitable when the organization wants to preserve raw data for future use cases that are not yet known.

Practical rule: Use a warehouse mindset when consistency, reporting, and trusted structured metrics matter more than raw flexibility.

3. Data lake: flexible storage for raw and varied data

A data lake is a centralized repository for storing large volumes of data in its original or near-original form. Unlike a warehouse, it does not require all data to be heavily structured before it is stored.

Data lakes are useful because modern organizations generate many types of data:

CSV and Excel files,
JSON and XML records,
application logs,
sensor data,
images and videos,
documents and PDFs,
customer messages,
web events,
model outputs and traces.

A lake is often based on scalable object storage. It is attractive because data can be stored cheaply and flexibly before all future uses are known. This is often described as schema-on-read: structure is applied when the data is read or processed for a specific task.

Data lakes are strong when:

data is varied or unstructured,
raw data should be preserved,
future use cases are uncertain,
data science teams need exploratory access,
large-scale processing is required,
AI workloads need access to raw documents, logs, images, or events.

For AI, data lakes are useful because many AI use cases start with data that does not fit neatly into relational tables. A RAG system may need documents. A computer vision project may need images. A predictive maintenance project may need sensor time series. A monitoring system may need logs and traces.

The risk is that a data lake can become a data swamp. If teams dump files into storage without ownership, metadata, quality checks, access control, or lifecycle rules, nobody knows what the data means or whether it can be trusted.

A lake therefore needs discipline:

clear zones, such as raw, cleaned, curated, and archived,
metadata and cataloguing,
data owners,
quality checks,
access rules,
retention policies,
lineage and documentation.

Flexibility is valuable only if people can still understand and govern what has been stored.

4. Lakehouse: combining lake flexibility with warehouse structure

A lakehouse is an architectural pattern that tries to combine the flexibility and scalability of a data lake with some of the structure, governance, and query performance expected from a data warehouse.

The basic idea is that data can be stored in open or lake-style storage, while table formats, metadata layers, transaction support, governance, and query engines make the data easier to use for analytics and AI.

A lakehouse is attractive because many organizations do not want two separate worlds: one data lake for data science and raw data, and one data warehouse for reporting. Separate systems can lead to duplicated data, inconsistent definitions, complex pipelines, and unclear governance.

Lakehouses are strong when:

the organization has both BI and AI workloads,
raw and curated data need to coexist,
teams want to reduce data copying,
structured and semi-structured data are both important,
governance must apply across analytics and machine learning,
the same platform should support SQL, data engineering, and AI workflows.

For AI, the lakehouse idea is especially relevant because modern AI projects often need both raw and curated data. For example:

a forecasting model may use curated sales tables,
a RAG system may use documents and metadata,
an evaluation pipeline may use logs, labels, and user feedback,
a monitoring system may combine structured metrics with unstructured traces.

The lakehouse can provide a shared foundation for these different workloads, but it is not a magic solution. It still needs data modeling, access control, cataloguing, ownership, quality checks, and operational discipline.

For a startup or SME, a lakehouse pattern may be useful if the organization expects both analytics and AI workloads to grow. But it may be overkill if the current need is simply a few clean reporting tables or a small curated document store.

Practical rule: A lakehouse is useful when you genuinely need both lake flexibility and warehouse-style reliability. It is not automatically necessary for every first AI project.

5. Data mesh: an ownership model, not just a storage system

A data mesh is often misunderstood. It is not simply another storage technology. It is mainly an organizational and architectural approach to data ownership.

The central idea is that data should be owned by the domains that understand it best. For example, sales owns sales data, operations owns operational data, support owns support data, and finance owns finance data. These domains treat important datasets as products that other teams can use.

A data mesh is useful when centralized data teams become bottlenecks. In a traditional setup, every data request may flow through one central team. That team is expected to understand every domain, clean every dataset, define every metric, and serve every user. At larger scale, this becomes difficult.

Data mesh tries to solve this by distributing responsibility while keeping common standards. A domain-owned dataset should have:

a clear owner,
documented meaning,
quality expectations,
access rules,
service-level expectations where needed,
metadata,
interfaces that other teams can use.

For AI, this matters because AI systems often fail when nobody owns the data they depend on. A RAG assistant needs document owners. A forecasting model needs consistent feature definitions. A customer assistant needs accurate customer and product data. A monitoring system needs clear accountability for logs and feedback.

However, data mesh is not usually the first architecture pattern a small organization should implement formally. It requires maturity: domain ownership, shared standards, data product thinking, governance, and platform support.

For SMEs, the useful lesson from data mesh is often simpler: make data ownership explicit. You may not need a full mesh, but you do need to know who owns customer data, product data, support documents, policies, training examples, and evaluation datasets.

6. Comparing the four patterns

The four patterns can be compared by what problem they mainly solve.

Warehouse: “Can we trust our structured business reporting?”

A warehouse is strongest when the organization needs consistent metrics and structured analytics. It is usually the best fit for dashboards, reporting, finance, sales analytics, and curated model features.

Lake: “Can we store and explore many kinds of data?”

A lake is strongest when the organization needs flexible storage for raw, varied, or unstructured data. It is useful for experimentation, data science, logs, documents, images, and future use cases that are not yet fully defined.

Lakehouse: “Can we unify analytics and AI on one governed foundation?”

A lakehouse is strongest when the organization wants both lake-style flexibility and warehouse-style reliability. It can reduce duplication and support a broader set of analytics, engineering, and AI workloads.

Mesh: “Can the people who understand the data own and serve it?”

A mesh is strongest when the organizational bottleneck is ownership, not storage. It helps large or complex organizations distribute data responsibility while maintaining shared governance standards.

The important point is that these patterns are not always mutually exclusive. A company might use a lakehouse as the technical foundation and apply data mesh principles for ownership. Or it might have a warehouse for finance and reporting, a lake for raw documents and logs, and a small curated knowledge base for RAG.

7. How these patterns show up in AI projects

Different AI use cases point toward different architectural needs.

Business forecasting

Demand forecasting, churn prediction, sales forecasting, and pricing models usually need structured, curated, historical data. A warehouse or lakehouse is often useful because the model depends on consistent definitions, time-based history, and reproducible training data.

RAG and knowledge assistants

RAG systems depend on documents, metadata, search indexes, and access control. A lake, lakehouse, or document-management system may be involved, but the key issue is not only storage. It is knowledge-base quality: authoritative sources, owners, freshness, permissions, and retrieval evaluation.

Computer vision and sensor data

Image, video, audio, and sensor workloads often need lake-style storage because the raw data is large and varied. The organization may also need specialized metadata, annotation systems, and high-throughput data loading for training.

AI product analytics

Monitoring AI assistants, model outputs, user feedback, guardrail triggers, and cost often benefits from structured analytics. A warehouse or lakehouse can help teams understand how the AI system behaves over time.

Enterprise-wide AI adoption

When many departments build AI systems, data ownership becomes critical. This is where data mesh thinking becomes useful: which domain owns which data product, which quality expectations apply, and how other teams can safely reuse the data.

8. The SME view: start with the data problem, not the architecture label

SMEs should avoid architecture-by-buzzword. The most practical approach is to describe the data problem first.

Ask:

Do we mostly need trusted dashboards and structured metrics?
Do we need to store raw files, logs, documents, images, or sensor data?
Do we need both reporting and AI experimentation on the same data foundation?
Is our biggest issue technology, or is it unclear ownership?
Do we have enough data volume and team maturity to justify a more complex platform?
Can we explain who owns each important dataset?
Can we reproduce the data used for an AI model or report?

The answers often reveal the right first step.

If the problem is inconsistent reporting, start with warehouse-style modeling and metric definitions. If the problem is messy raw files and future AI use cases, start with lake-style storage plus metadata. If the problem is duplication between analytics and AI teams, consider lakehouse patterns. If the problem is that nobody owns the data, apply data mesh principles before buying another tool.

9. Common mistakes

Mistake 1: Building a lake without governance

A data lake without metadata, ownership, and lifecycle rules becomes a dumping ground. It may store everything, but help nobody.

Mistake 2: Using a warehouse for every kind of data

Warehouses are excellent for structured analytics, but they are not always the best home for raw documents, images, logs, or experimental data.

Mistake 3: Calling a storage platform a data mesh

Data mesh is mainly about domain ownership and data-as-a-product. Buying a tool does not create a mesh if domains do not own and serve their data responsibly.

Mistake 4: Choosing lakehouse because it sounds modern

A lakehouse can be powerful, but it still requires governance, engineering, and operating discipline. It should solve a real need, not satisfy a trend.

Mistake 5: Ignoring small-scale simplicity

Many organizations can start with a small curated warehouse, a well-managed document repository, or a simple data lake zone before adopting a full platform architecture.

10. Architecture and governance must evolve together

Data architecture is not only technical. It is also organizational. If data ownership, quality expectations, access rights, and lifecycle processes are unclear, no architecture pattern will save the project.

A practical governance baseline includes:

dataset or document owners,
clear access rules,
metadata and cataloguing,
versioning or history where needed,
quality checks,
retention and deletion rules,
lineage for important reports or AI training data,
review processes for high-risk sources.

These practices matter whether you choose warehouse, lake, lakehouse, or mesh. The more AI systems depend on the data, the more important these practices become.

11. A practical decision guide

Use the following guide as a first approximation.

Choose a warehouse-style approach if your main goal is trusted reporting, structured analytics, finance, sales, operations dashboards, or curated predictive features.
Choose a lake-style approach if your main goal is to store raw, varied, or unstructured data for exploration, data science, RAG, logs, images, or future unknown uses.
Choose a lakehouse-style approach if you need both analytics and AI workloads on a shared governed foundation and want to reduce unnecessary data copying.
Use data mesh principles if the main issue is ownership across departments: who owns the data, who guarantees quality, and how other teams can safely reuse it.

These choices can be combined. The goal is not to force every dataset into one architectural category. The goal is to make the data foundation understandable and useful.

12. What this means for startups and SMEs

For a startup or SME, the right first architecture is usually the simplest one that supports the next safe step.

A practical maturity path could look like this:

Stage 1 — Organize: identify where important data and documents live, who owns them, and which sources are authoritative.
Stage 2 — Curate: create clean reporting tables, a trusted document collection, or a small controlled lake zone for AI experiments.
Stage 3 — Govern: add metadata, access rules, versioning, and quality checks.
Stage 4 — Scale: adopt warehouse, lake, or lakehouse tooling as volume, users, and workloads grow.
Stage 5 — Distribute ownership: apply data mesh principles when multiple domains must own and serve reusable data products.

This staged approach prevents a common trap: buying a sophisticated platform before the organization has decided what data matters, who owns it, and how it will be used.

13. Questions before choosing a platform

Before selecting a warehouse, lake, lakehouse, or mesh strategy, decision makers should ask:

What are the first three use cases this architecture must support?
Are they reporting, AI experimentation, RAG, model training, monitoring, or operational workflows?
What data types are involved: tables, documents, logs, images, sensor data, or mixed data?
Who owns the most important data sources?
How much governance is required?
How will users discover and understand available data?
How will access control work?
How will outdated or duplicated data be handled?
What skills does the team have to operate the architecture?
Can we start smaller and expand when the need is proven?

These questions make architecture a business decision, not only an IT decision.

Bottom line: Warehouses, lakes, lakehouses, and meshes are not competing buzzwords. They solve different problems. Warehouses provide trusted structured analytics. Lakes provide flexible storage for varied data. Lakehouses aim to combine flexibility with structure. Data mesh focuses on ownership and accountability. The right choice depends on your data, use cases, governance needs, team maturity, and AI ambitions.

🧩 Task

Interactive task: Choose the architecture that sounds closest to your current needs. You’ll get a plain-language recommendation.

Module 7 · Section 3

Hosting models: managed cloud, self-hosted, or hybrid

This section focuses on where models actually run — and the trade-offs between speed, cost control, and strategic independence.

For many teams, model hosting decisions are driven by convenience. But hosting is also about data movement, latency, privacy, vendor dependence, and operational burden.

Deep Dive: A practical view of hosting choices

Hosting is a convenience-vs-control trade-off. Most SMEs land on hybrid: managed where speed matters, self-hosted where data sensitivity does.

Once an AI model or foundation-model workflow becomes useful, the next question is: where does it actually run? This is the model-hosting decision. It sounds technical, but it has direct consequences for cost, latency, privacy, security, vendor dependence, engineering effort, and the long-term flexibility of the AI system.

Model hosting is not only about “cloud or not cloud”. A team might call a managed model API, deploy a model endpoint on a cloud ML platform, self-host an open-weight model on GPUs, run a smaller model on a CPU server, place inference at the edge, or combine several of these approaches. The right choice depends on the workload and on what the organization is willing to operate.

Hosting is not just an IT detail. It determines who controls the model, where data flows, how predictable costs are, how fast responses can be, and how much operational responsibility your team takes on.

1. What “model hosting” means

Hosting a model means making its predictive or generative capability available to an application, user, workflow, or system. In a simple application, this might mean sending a prompt to a model API and receiving an answer. In a more complex application, it might mean deploying a model as an endpoint, scaling GPUs, routing requests, monitoring latency, logging outputs, validating structured responses, and controlling access.

Hosting decisions affect several AI patterns:

Prompt-based applications: where the model is called through an API or endpoint.
RAG systems: where the model receives retrieved context and generates grounded answers.
Classification and extraction workflows: where the model returns labels or structured fields.
Fine-tuned models: where a specialized model must be deployed and maintained.
Batch pipelines: where predictions are computed for many records at once.
Edge applications: where the model runs close to the device, user, or machine.

A model can be technically excellent but poorly hosted. If it is too slow, too expensive, unavailable at peak times, difficult to update, or not allowed to process the required data, the application will fail in practice.

2. Managed cloud and model APIs: speed and convenience

The fastest hosting path is usually a managed model API or managed cloud AI service. The provider operates the model infrastructure. Your application sends requests and receives responses. This can be the easiest way to prototype and launch early AI features.

Managed hosting is attractive because the provider usually handles:

model serving infrastructure,
GPU allocation,
scaling,
model updates,
availability,
basic endpoint management,
often some safety and optimization layers.

For startups and SMEs, this can be extremely valuable. The team can focus on the use case, user experience, prompts, retrieval, workflow integration, and evaluation instead of building a serving stack from scratch.

Managed hosting is often a good fit when:

you need to validate value quickly,
you do not yet know which model is best,
internal ML infrastructure skills are limited,
usage is moderate or uncertain,
speed of development matters more than full control,
the data can legally and contractually be processed by the provider.

The downside is that convenience creates dependency. You depend on the provider’s model availability, pricing, rate limits, terms of service, regional availability, data-processing rules, and future product decisions.

3. Watch-outs for managed hosting

Managed hosting should not be treated as “free infrastructure”. It shifts operational burden to the provider, but it does not remove the organization’s responsibility.

Key watch-outs include:

Data privacy: what data is sent to the provider?
Data retention: are prompts, files, outputs, or logs stored?
Compliance: is the provider acceptable for your regulatory environment?
Cost variability: what happens when usage grows?
Latency: is response time good enough for the workflow?
Rate limits: can the endpoint handle peak demand?
Model changes: can behaviour change when the provider updates the model?
Vendor lock-in: how hard would it be to switch model provider?
Fallbacks: what happens if the API is unavailable?

For early prototypes, these concerns can often be handled with simple controls. For production workflows, they need explicit design decisions.

A practical approach is to separate application logic from provider-specific logic. Your application should not be so tightly coupled to one provider that changing models requires rewriting the entire system. This does not mean avoiding managed APIs. It means using them intentionally.

4. Self-hosting: control and responsibility

Self-hosting means that your organization runs the model itself. The model might be an open-weight foundation model, a fine-tuned model, a classical ML model, or a smaller specialized model. It may run on your own servers, in your private cloud, on rented GPU instances, in a Kubernetes cluster, or on HPC infrastructure.

Self-hosting gives more control over:

where data is processed,
which model version is used,
how the model is optimized,
how endpoints are secured,
how logs are handled,
how costs are managed,
how updates and rollbacks happen,
whether the model can be customized or fine-tuned.

This control can be valuable when data is sensitive, regulatory requirements are strict, workloads are large and predictable, or the model capability is strategically important.

But self-hosting also means your team owns the operational burden. Someone must provision infrastructure, deploy the model, monitor the endpoint, handle failures, patch dependencies, manage GPU utilization, optimize latency, track cost, secure access, and update the model safely.

Practical rule: Self-hosting gives you more control, but only if you have the skills and processes to use that control responsibly.

5. What self-hosting actually requires

Self-hosting a model is more than downloading weights and starting a server. A production setup needs several components.

At minimum, consider:

Model artefacts: where the model weights, tokenizer, configuration, and versions live.
Serving runtime: the software used to load the model and respond to requests.
Hardware: CPU, GPU, memory, storage, and network capacity.
Endpoint security: authentication, authorization, and rate limits.
Scaling: how to handle more users or larger requests.
Monitoring: latency, throughput, errors, GPU utilization, memory use, and cost.
Logging: enough to debug, but not so much that sensitive data is over-retained.
Deployment process: how new versions are tested, released, and rolled back.
Fallbacks: what happens when the endpoint fails or becomes overloaded.

For small models, this can be manageable. For large language models, especially those requiring GPUs, it becomes a serious infrastructure task.

6. Hybrid hosting: combine speed and control

Hybrid hosting means using more than one hosting approach. This is often the most realistic path once an organization moves beyond early experimentation.

A hybrid setup might use:

a managed API for general-purpose drafting and summarization,
a self-hosted model for sensitive internal data,
a smaller local model for low-cost classification,
a cloud GPU endpoint for occasional heavy workloads,
edge inference for low-latency factory or device use cases,
HPC resources for fine-tuning, benchmarking, or batch experimentation.

Hybrid approaches are useful because not all AI workloads have the same requirements. A chatbot for internal brainstorming, a RAG assistant for confidential documents, and a computer-vision quality-control system may require different hosting choices.

The danger is complexity. Hybrid systems require clear routing rules, consistent security, monitoring across environments, data-flow documentation, and people who understand the whole picture.

Hybrid is a good choice when it solves a real problem. It is a poor choice when it is only adopted because the organization cannot decide.

7. Online prediction versus batch prediction

Hosting decisions also depend on whether predictions are needed immediately or can be computed ahead of time.

Online prediction means the model responds when a user or system sends a request. This is needed for chatbots, customer-support assistants, live recommendations, fraud checks, interactive search, or real-time document workflows.

Online prediction requires attention to:

latency,
availability,
autoscaling,
request spikes,
timeouts,
user experience,
fallback behaviour.

Batch prediction means predictions are computed for many items at once, often on a schedule. This is useful for daily scoring, monthly forecasting, overnight document processing, customer segmentation, or preparing recommendations before users need them.

Batch prediction can reduce latency pressure because the user does not wait for the model. It can also make better use of compute resources. But it is less flexible when information changes quickly.

For many SMEs, batch is underused. Not every AI system needs real-time inference. If a report, forecast, or classification can be computed overnight, batch hosting may be simpler, cheaper, and more robust.

8. Latency: where time is actually spent

Teams often focus only on model speed, but end-to-end latency includes more than model inference.

A user request may involve:

network transfer,
authentication,
retrieval from a vector database or search system,
database queries,
prompt construction,
model inference,
tool calls,
output validation,
post-processing,
logging.

A faster model does not help much if retrieval is slow, the network round trip is long, or the system calls several tools sequentially. Hosting decisions should therefore consider the complete workflow, not only the model endpoint.

Practical latency improvements may include:

using a smaller model for simple tasks,
caching repeated answers or retrieved context,
reducing prompt length,
streaming responses to users,
batching requests,
placing inference closer to users or devices,
precomputing results where possible,
parallelizing independent tool calls.

Hosting is therefore a product decision as well as an infrastructure decision. Users experience latency as part of quality.

9. Cost: model hosting has unit economics

AI hosting cost can grow quickly. In managed APIs, cost is often driven by token volume, request volume, model choice, retrieval steps, and tool calls. In self-hosting, cost is driven by hardware, GPU utilization, storage, engineering time, operations, monitoring, and redundancy.

The important question is not simply “Which is cheaper?” It is: which hosting path gives acceptable quality at acceptable cost for this workload?

Managed APIs may be cheaper for low or uncertain usage because you pay for consumption and avoid infrastructure setup. Self-hosting may become attractive when usage is high, predictable, and technically manageable. Hybrid may be useful when only some workloads justify self-hosting.

A practical cost review should include:

cost per request,
cost per successful workflow,
input and output token volume,
number of model calls per user action,
retrieval and embedding costs,
GPU utilization,
engineering and operations time,
monitoring and logging retention,
expected growth in users or documents.

A demo that costs very little for ten users may become expensive for ten thousand users. Hosting choices should therefore be revisited when the usage pattern becomes clearer.

10. Privacy and data movement

Hosting determines where data goes. In a managed API setup, prompts, retrieved context, files, or structured records may leave your environment and be processed by a provider. In a self-hosted setup, data can stay within your own environment, but only if the entire application is designed that way.

Important questions include:

What user data is sent to the model?
Are retrieved documents included in prompts?
Are prompts and outputs logged?
Where are logs stored?
Which countries or regions process the data?
Can sensitive data be redacted before model calls?
Are provider terms acceptable for this use case?
Do users know what data is being processed?

For low-risk internal drafting, a managed API may be fine. For confidential contracts, customer records, HR documents, healthcare data, or regulated workflows, the hosting decision needs much more care.

The rule is simple: do not decide hosting separately from data classification.

11. Vendor dependence and optionality

Managed services can accelerate development, but they can also create lock-in. Lock-in is not always bad. A good vendor can save time and reduce operational risk. But lock-in should be intentional, not accidental.

Vendor dependence can appear in several places:

model-specific prompt behaviour,
provider-specific APIs,
proprietary embedding models,
managed vector stores,
logging and evaluation tools,
fine-tuning formats,
security and governance integrations,
pricing and rate-limit structures.

To preserve optionality, teams can:

keep prompts and evaluation sets outside the provider interface where possible,
use abstraction layers carefully,
record model and prompt versions,
avoid unnecessary provider-specific features in early prototypes,
maintain test cases that can compare providers,
store important application data in systems the organization controls.

Optionality does not mean switching providers every month. It means being able to switch when business, regulatory, quality, or cost reasons justify it.

12. Security and access control

A hosted model endpoint is a powerful system component. It may process sensitive data, generate business decisions, call tools, or expose outputs to users. It therefore needs security controls like any other production service.

Key controls include:

authentication: who can call the model endpoint?
authorization: what is each user or service allowed to request?
rate limits: how are abuse and accidental overload prevented?
network controls: where can requests come from?
secret management: where are API keys and credentials stored?
logging: what is recorded and who can inspect it?
data minimization: is unnecessary sensitive data kept out of prompts?
output validation: are generated outputs checked before downstream use?

In RAG systems, access control is especially important. The model should only receive retrieved documents that the user is allowed to see. Do not retrieve everything and then hope the model hides restricted content.

13. Monitoring hosted models

Once a model is hosted, monitoring becomes essential. It is not enough to know whether the endpoint is “up”. You need to know whether it is useful, safe, affordable, and performing within acceptable limits.

Monitor:

latency,
throughput,
error rate,
timeout rate,
cost per request,
token usage,
GPU or CPU utilization,
memory use,
queue length,
fallback usage,
invalid output rate,
user feedback,
quality and safety incidents.

For self-hosted models, infrastructure monitoring is especially important: GPU memory, GPU utilization, server health, storage, networking, and deployment status. For managed APIs, cost, latency, provider errors, rate limits, and model-behaviour changes are often more important.

Monitoring should connect to ownership. Someone must be responsible for reviewing alerts, costs, failures, and user feedback.

14. Model updates and deployment safety

Hosting also includes updating models safely. A new model version may improve one task but degrade another. A provider update may change tone, refusal behaviour, tool use, output structure, or latency. A self-hosted update may introduce dependency problems or hardware incompatibilities.

Safer deployment practices include:

testing new model versions on realistic evaluation cases,
using shadow deployments where the new model runs without affecting users,
using canary releases for a small percentage of traffic,
keeping rollback options,
recording model, prompt, and retrieval versions,
monitoring quality, latency, and cost after deployment,
communicating changes to users when behaviour may change.

A hosted model is not “done” after launch. It is a living service.

15. Edge hosting: when models need to run close to the action

In some use cases, the model should run near the device, machine, user, or production environment. This is edge hosting. It can be useful when latency, connectivity, privacy, or bandwidth make centralized inference difficult.

Edge hosting can fit:

factory quality inspection,
robotics,
predictive maintenance,
vehicles,
medical devices,
retail or point-of-sale devices,
offline or low-connectivity environments,
privacy-sensitive on-device processing.

The benefits are clear: lower network latency, reduced data transfer, better offline operation, and sometimes stronger privacy. But edge hosting introduces constraints: limited memory, limited compute, battery or power limits, hardware variation, update complexity, physical device security, and fleet management.

Edge is therefore not simply “better privacy” or “faster inference”. It is a different operational model. The model must be small and efficient enough for the device, and the organization must be able to update and monitor deployed devices.

16. How to choose between managed, self-hosted, and hybrid

The hosting decision should follow the use case. Start with the requirements, not the ideology.

Choose managed cloud or model APIs when:

speed of development is the priority,
the use case is still being validated,
usage is uncertain,
the provider’s data terms are acceptable,
internal infrastructure skills are limited,
you need access to strong general-purpose models quickly.

Choose self-hosting when:

data sensitivity or sovereignty is a major concern,
usage is high and predictable,
you need tighter control over model versions,
you need custom optimization or fine-tuning,
vendor dependence is strategically risky,
you have the operational skills to run the system.

Choose hybrid when:

different workloads have different risk and performance profiles,
sensitive workloads need local control but generic tasks can use managed APIs,
you want a gradual path from API-based prototypes to controlled production hosting,
you need both speed and strategic optionality,
some workloads require edge, HPC, or specialized hardware.

For many SMEs, the practical path is staged: start managed, evaluate value, measure cost and latency, identify sensitive workloads, and then selectively self-host or hybridize where the business case is clear.

17. What this means for startups and SMEs

Startups and SMEs should not begin with the most complex hosting architecture. They should begin with the simplest hosting choice that supports learning safely.

A practical maturity path is:

Experiment: use managed APIs or simple endpoints to validate the use case.
Pilot: add logging, cost tracking, evaluation, access control, and fallback behaviour.
Operationalize: choose whether managed hosting remains sufficient or whether sensitive, high-volume, or strategic workloads need more control.
Optimize: reduce cost and latency through caching, smaller models, batching, quantization, or self-hosting where justified.
Hybridize: combine managed, self-hosted, edge, or HPC resources only when requirements clearly demand it.

This approach avoids both extremes: outsourcing everything without control, and building an infrastructure platform before the use case is proven.

18. Questions before committing to a hosting path

Before choosing a hosting model, decision makers should ask:

What data will be sent to the model?
Is the workload experimental, operational, or business-critical?
Does the application need real-time responses or can it run in batch?
What latency is acceptable for users?
How predictable is usage volume?
What would the cost look like at 10x usage?
Can the provider’s terms support this use case?
Do we need control over model version and update timing?
Who will monitor the endpoint?
What happens when the endpoint fails?
How hard would it be to switch provider or architecture later?

These questions turn hosting from a technical preference into a business decision.

Bottom line: Managed hosting gives speed and reduces operational burden. Self-hosting gives control but creates responsibility. Hybrid hosting can balance both, but adds complexity. The right choice depends on data sensitivity, latency, usage volume, cost, skills, governance, and how strategic the AI capability is. Start simple, measure honestly, and add control only where it creates real value.

🧩 Task

Interactive task: Choose the hosting path you are leaning toward. You’ll get the main watch-outs to consider before committing.

Module 7 · Section 4

Cloud, on-prem, and hybrid reality

This section helps you think beyond slogans. Most organizations do not need a pure answer — they need a practical combination.

Deep Dive: When hybrid makes sense

Cloud, on-premises, and hybrid infrastructure are often discussed as if one of them must be the “right” answer. In reality, most organizations do not need a pure ideology. They need a practical deployment stance that fits their data, risk, skills, budget, existing systems, and AI ambitions.

For AI, the question is not simply “cloud or on-prem?” The question is: which workloads should run where, and why? Some workloads benefit from the speed and managed services of the cloud. Some need tighter control because of sensitive data, latency, regulation, or existing systems. Some may use HPC or specialized GPU resources for short periods. Some may run at the edge because data should stay near a machine, device, or factory floor.

Hybrid infrastructure means combining environments intentionally: public cloud, private cloud, on-premises systems, edge devices, and sometimes HPC resources. Hybrid can be powerful, but it is not automatically better. It gives flexibility, but it also creates coordination complexity.

Hybrid makes sense when different workloads have genuinely different requirements. It becomes a problem when it is created accidentally by disconnected tool choices, unclear ownership, or unresolved governance decisions.

1. Start with the workload, not the slogan

A common mistake is to begin with a fixed infrastructure preference: “everything should go to the cloud”, “everything must stay on-prem”, or “we need multicloud to avoid lock-in”. These preferences may be understandable, but they are not enough for good architecture.

AI workloads differ. A prompt-based prototype, a RAG assistant, a fine-tuning job, a batch document-processing pipeline, a real-time customer-facing chatbot, a factory vision system, and a regulated decision-support workflow all have different infrastructure needs.

Good infrastructure decisions start by classifying the workload:

Is it experimental, pilot-stage, or production-critical?
Does it use public, internal, confidential, personal, or regulated data?
Does it need real-time responses or can it run in batch?
Does it need GPUs continuously or only occasionally?
Does it need to integrate with existing internal systems?
Does it need strict audit trails or data residency?
Who will operate and monitor it?

Once these questions are clear, the infrastructure choice becomes less ideological. The organization can decide which environment is best suited for each workload.

2. What cloud is good at

Public cloud is attractive because it gives fast access to infrastructure and managed services. Teams can provision compute, storage, databases, model APIs, logging, monitoring, and security services without buying hardware or building everything from scratch.

Cloud is often a good fit when:

speed of experimentation matters,
usage is uncertain or spiky,
the team wants managed databases, managed ML services, or model APIs,
the workload needs temporary GPU capacity,
the organization does not want to operate hardware,
the data can legally and contractually be processed in the cloud,
the team needs collaboration across locations.

For startups and SMEs, cloud can be the fastest way to start. A team can test an AI idea without buying servers, setting up a GPU cluster, or maintaining its own data centre. This is especially useful when the use case is still uncertain.

Cloud also offers a wide range of managed services. These can reduce operational burden, because the provider handles much of the infrastructure maintenance. For a small team, this can be a major advantage.

But cloud is not magic. It still requires architecture, governance, cost monitoring, access control, and operational ownership. A badly designed cloud system can become expensive, insecure, and difficult to migrate away from.

3. What on-premises infrastructure is good at

On-premises infrastructure means systems operated in an organization’s own environment, such as local servers, private data centres, private GPU machines, internal databases, or specialized equipment close to operations.

On-premises can be attractive when:

data must remain inside organizational boundaries,
regulatory or contractual requirements restrict external processing,
legacy systems are difficult to move,
latency to local machines or devices matters,
the organization already has significant infrastructure investments,
workloads are predictable and high-volume,
the organization wants tighter control over model versions, data flows, or network access.

On-premises infrastructure can provide control, but it also creates responsibility. The organization must manage hardware, capacity planning, security patches, backups, access control, monitoring, power, cooling, upgrades, and incident response.

For AI workloads, this can become demanding. GPUs are expensive, fast-moving, and often hard to keep highly utilized. A company may buy powerful hardware and then discover that the bottleneck is storage, data loading, environment management, or lack of specialist staff.

On-premises therefore makes most sense when the control advantage is real and the organization can operate the environment properly.

4. Why hybrid exists

Hybrid architecture exists because real organizations are mixed. They often have existing systems, new cloud services, sensitive data, legacy databases, SaaS tools, local devices, and emerging AI workloads at the same time.

Hybrid becomes attractive for several reasons:

Data residency: some data must stay in a specific country, region, or controlled environment.
Legacy investments: important systems may already run on-premises and cannot easily move.
Transition: migration to cloud may take years, so both worlds coexist.
Burst capacity: cloud or HPC can provide extra compute for occasional large workloads.
Best-of-breed services: different platforms may offer better tools for different tasks.
Cost control: predictable workloads may be cheaper on owned infrastructure, while variable workloads may fit cloud better.
Specialized hardware: GPUs, accelerators, edge devices, or HPC systems may be needed only for specific workloads.

Hybrid is therefore not a failure to choose. It can be a rational design. But it should be deliberate. Accidental hybrid happens when teams adopt tools independently and only later discover that data, identity, monitoring, and governance are fragmented.

Practical rule: Hybrid should solve a real constraint: data residency, latency, existing systems, burst compute, cost, or specialized hardware. Do not choose hybrid only because it sounds flexible.

5. The hidden cost of hybrid: coordination complexity

Hybrid infrastructure is not only technically complex. It is organizationally complex. Different environments often have different access systems, security models, cost structures, monitoring tools, deployment processes, and operational teams.

Hybrid creates coordination challenges such as:

consistent governance across environments,
identity and access management across cloud and on-prem systems,
data synchronization,
network connectivity,
duplicate datasets,
different runtime environments,
different logging and monitoring tools,
unclear ownership between IT, data teams, and product teams,
higher skill requirements for staff.

For example, a company may keep customer data on-premises but run model inference in the cloud. That sounds simple until the team must decide which data can leave, how it is anonymized, how prompts are logged, how failures are monitored, and how cloud outputs are written back to internal systems.

Hybrid therefore needs architecture discipline. Without it, the organization may end up with multiple copies of data, inconsistent permissions, unclear audit trails, and workflows that break whenever one environment changes.

6. Data movement is often the real bottleneck

AI teams often focus on compute: CPUs, GPUs, memory, and model size. But in hybrid systems, data movement can be the real bottleneck.

Moving data between environments can create:

latency,
network bandwidth constraints,
egress costs,
security review requirements,
data duplication,
version conflicts,
lineage problems,
uncertainty about which copy is authoritative.

This is especially important for AI because datasets can be large. Document collections, logs, images, videos, sensor data, model checkpoints, embeddings, and training datasets can be expensive or slow to move.

Before designing a hybrid architecture, ask:

Where is the source data created?
Where is the authoritative copy?
Where does processing happen?
Where are model outputs stored?
How often must data move?
Can we move computation to the data instead of moving data to computation?
What data can be summarized, embedded, anonymized, or cached?

In many cases, the best architecture is the one that minimizes unnecessary data movement.

7. Identity and access control across environments

In a simple single-environment system, access control is already important. In hybrid infrastructure, it becomes harder because different environments may use different identity systems and permission models.

Public cloud providers have their own identity and access management systems. On-premises environments may use directory services, local accounts, VPNs, Kerberos, LDAP, or other enterprise identity systems. SaaS tools may have their own user roles. HPC systems often have account and project-based access models.

This matters for AI because AI systems often sit across boundaries:

a RAG assistant retrieves internal documents,
a model endpoint runs in the cloud,
logs are stored in another system,
users authenticate through enterprise identity,
outputs are written back to an internal database.

A weak access-control design may allow the AI system to retrieve information that the user should not see, or write outputs to systems it should not modify.

Strong hybrid access control should define:

how users are identified,
which roles map across systems,
which data sources each role can access,
which model endpoints each application can call,
which actions require approval,
how access is revoked when people change roles,
how access events are logged and reviewed.

Access control should be enforced in the system architecture, not left to prompt instructions.

8. Cost: cloud flexibility versus cost predictability

Cloud is attractive because it is elastic. You can scale up quickly and pay for what you use. This is useful for experimentation and variable workloads. But cloud cost can become hard to predict when usage grows, model calls multiply, storage accumulates, logs expand, or data moves between regions and services.

On-premises infrastructure can provide more predictable cost once hardware is purchased, but it requires upfront investment and can be underutilized. If expensive GPUs sit idle, the effective cost per useful workload becomes high.

Hybrid cost management therefore requires a full view:

cloud compute cost,
model API cost,
storage cost,
data egress cost,
logging and monitoring cost,
on-prem hardware depreciation,
operations staff time,
utilization of GPUs or servers,
cost of outages or slow delivery,
cost of vendor lock-in or migration.

For SMEs, the key is not to calculate everything perfectly on day one. The key is to make cost visible early. A prototype should already track rough cost per workflow, model calls, tokens, GPU hours, storage, and data movement where relevant.

9. Portability: useful, but not free

Many organizations want portability: the ability to move workloads between providers, cloud and on-prem environments, or different compute platforms. This is understandable because AI tooling changes quickly and vendor lock-in can create strategic risk.

But portability has a cost. If you avoid every provider-specific feature, you may lose the productivity benefits that made the provider attractive. If you build abstraction layers too early, you may slow the team down before the use case is proven.

A practical approach is selective portability:

keep core business data in systems you control,
store prompts, evaluation sets, and application logic outside vendor-only interfaces where possible,
use open formats for important datasets and model artefacts,
containerize workloads that may need to move,
avoid unnecessary dependence on one provider’s proprietary APIs early,
accept managed services where they clearly reduce operational burden.

Portability should be based on risk. A short-lived prototype may not need strong portability. A strategic AI product or regulated workflow may need much more.

10. Cloud-first, on-prem-first, or workload-first?

Some organizations adopt a “cloud-first” strategy. Others prefer “on-prem-first” because of data sensitivity, legacy systems, or existing investments. For AI, a more practical framing is often workload-first.

Workload-first means each workload is placed where it best fits:

low-risk experimentation may use managed cloud APIs,
sensitive RAG may run in a controlled environment,
large fine-tuning jobs may use HPC or temporary GPU cloud capacity,
factory inference may run at the edge,
core customer data may remain on-premises,
dashboards may use a cloud data warehouse or lakehouse.

This approach avoids one-size-fits-all architecture. It also forces teams to document the reason for each placement decision.

11. The role of HPC in hybrid reality

HPC can be part of a hybrid AI strategy when workloads require substantial compute but not necessarily continuous production hosting. Examples include large training or fine-tuning jobs, large-scale benchmarking, simulation-driven AI, computer vision training, synthetic data generation, and batch processing.

For startups and SMEs, shared HPC or EuroHPC resources can provide access to advanced compute that would otherwise be difficult to afford. This can be especially useful for experimentation, capability building, and workloads with clear compute requirements.

But HPC also has a different operating model:

jobs are scheduled rather than started instantly,
data must be staged and managed carefully,
software environments need planning,
interactive development may require special workflows,
users need to understand queues, accounts, filesystems, and resource requests.

In hybrid reality, HPC is not a replacement for all cloud or on-prem infrastructure. It is a powerful option for the workloads that benefit from it.

12. Governance in hybrid environments

Hybrid environments make governance harder because the organization must apply policies across more than one place. Data may be stored on-premises, processed in cloud, indexed in a search service, logged in an observability tool, and used by an AI application running elsewhere.

Governance questions include:

Where is each dataset stored?
Which copy is authoritative?
Who owns each dataset?
Who can access it in each environment?
Where are prompts and outputs logged?
How long are logs retained?
How are model versions tracked?
How are cross-environment incidents investigated?
How are deletions and corrections propagated?

The more environments are involved, the more important it becomes to have a simple inventory of data flows, systems, owners, and controls. Without this, hybrid architecture becomes difficult to audit and difficult to improve.

13. Skills: hybrid needs broader operational capability

Hybrid systems require broader skills than a single managed platform. Teams may need to understand cloud services, on-prem networks, identity systems, containers, data movement, security, monitoring, deployment automation, HPC workflows, and cost management.

For SMEs, this is a major consideration. A hybrid architecture that looks good on a diagram may be unrealistic if nobody can operate it.

Before committing to hybrid, ask:

Who will maintain each environment?
Who understands the data flows between them?
Who responds when something fails?
Who controls access and credentials?
Who monitors cost and usage?
Who can debug cross-environment problems?

Hybrid can be the right architecture, but it needs ownership. Otherwise, it becomes a set of disconnected systems held together by manual workarounds.

14. A practical deployment stance for SMEs

A practical SME stance is usually staged and workload-based.

Start with cloud or managed services when the use case is low-risk, experimental, and needs fast validation.
Keep sensitive data controlled when personal, confidential, regulated, or customer-specific information is involved.
Use on-premises or private environments when data locality, existing systems, or latency justify the extra operational responsibility.
Use HPC or temporary GPU capacity for compute-heavy training, fine-tuning, simulation, or batch processing workloads.
Hybridize deliberately only when there is a clear reason and a clear owner for the resulting complexity.

This stance keeps the organization flexible without forcing unnecessary complexity too early.

15. Questions before choosing hybrid

Before adopting a hybrid architecture, decision makers should ask:

Which exact workloads require more than one environment?
What data must stay local, and why?
What data may move to cloud, and under which conditions?
Where will AI inference happen?
Where will training, fine-tuning, or batch processing happen?
How will access control work across environments?
How will logs, costs, and incidents be monitored?
How will data copies remain synchronized?
Who owns each environment and each data flow?
What is the exit strategy if one platform becomes too expensive or unsuitable?

If these questions cannot be answered, the hybrid plan may be premature.

16. Common anti-patterns

Anti-pattern 1: Accidental hybrid

Different teams adopt different tools without a shared architecture. Later, the organization discovers that data, identity, monitoring, and costs are fragmented.

Anti-pattern 2: Cloud without cost visibility

The team moves quickly but does not track usage, tokens, storage, egress, or model calls. Costs only become visible after the workflow scales.

Anti-pattern 3: On-prem control without operations capacity

The organization keeps everything internal but lacks the staff, tooling, or processes to manage AI workloads reliably.

Anti-pattern 4: Portability theatre

The architecture claims to be portable, but important data, prompts, workflows, or monitoring are still locked into a specific provider.

Anti-pattern 5: Moving data because compute is elsewhere

Large datasets are repeatedly copied across environments, creating cost, latency, security risk, and confusion about which copy is current.

17. What this means for decision makers

Decision makers do not need to become cloud architects, but they do need to understand the trade-off. Cloud gives speed and managed capability. On-prem gives control but requires operations. Hybrid gives flexibility but increases coordination complexity.

The right question is not: “Which infrastructure is best?”

The right question is: “Which infrastructure stance lets us deliver this AI workload safely, affordably, and repeatably — with the people and constraints we actually have?”

For many organizations, the answer will change over time. A prototype may start in the cloud. A sensitive pilot may move into a controlled environment. A compute-heavy training job may run on HPC. A production system may use a hybrid architecture. This evolution is normal, as long as each step is deliberate.

Bottom line: Cloud, on-premises, and hybrid infrastructure are not competing ideologies. They are deployment options. Cloud is strong for speed and managed capability. On-premises is strong for control and locality. Hybrid is useful when workloads genuinely require a mix — but it adds governance, access-control, data-movement, cost, monitoring, and skill complexity. Choose hybrid only when the value of flexibility is greater than the cost of coordination.

🧩 Task

Interactive task: Select the pressures that apply to you. Then generate a suggested deployment stance.

Module 7 · Section 5

HPC and EuroHPC: where they fit in the AI journey

This section explains the role of high-performance computing for demanding AI workloads — especially training, large-scale testing, and experimentation.

HPC is not the default answer for every AI task. It becomes valuable when you need large compute capacity, parallel experimentation, or access to specialized resources that would be inefficient or too expensive to build alone.

Deep Dive: When HPC and EuroHPC become relevant for AI

HPC and EuroHPC matter when training or demand is genuinely heavy. Prove value on cheap compute first, then climb — don't over-provision early.

High-performance computing, or HPC, can sound intimidating. Many people associate it with climate simulations, physics, engineering, or large research institutions. But HPC is becoming increasingly relevant for AI because modern AI workloads often need more compute, memory, storage bandwidth, and parallel processing than a laptop or ordinary server can provide.

For startups and SMEs, the important point is not that HPC is “the biggest possible infrastructure”. The important point is that shared European HPC infrastructure can give smaller organizations access to advanced compute that would otherwise be difficult or expensive to obtain. This is especially relevant when AI projects move beyond simple API use and begin to involve large datasets, large models, fine-tuning, batch inference, simulation, synthetic data generation, or scalable experimentation.

EuroHPC and the European AI Factories ecosystem are designed to make this kind of advanced computing more accessible. EuroHPC JU describes AI Factories as ecosystems around AI-optimised supercomputers that offer computing resources and support services to industry, research users, startups, and SMEs. The current EuroHPC AI Factory access modes explicitly include Playground access for entry-level users, Fast Lane access for users already familiar with HPC, and Large Scale access for AI models and applications requiring larger allocations. :contentReference[oaicite:1]{index=1}

HPC is not the first tool for every AI project. It becomes relevant when compute, data volume, model size, experiment scale, or European data/control requirements exceed what is practical with ordinary local or SaaS-based infrastructure.

1. What HPC means in practical terms

HPC means using many powerful computing resources together to solve workloads that are too large, too slow, or too expensive on ordinary machines. Instead of one laptop or one server, HPC systems provide many nodes connected by high-speed networks, shared storage, and a scheduler that distributes work across users and jobs.

In AI, HPC can support:

training or fine-tuning models on GPUs,
running large batch inference jobs,
processing very large document or image collections,
generating embeddings at scale,
benchmarking models across many configurations,
running distributed training experiments,
generating synthetic data,
combining AI with simulation or scientific computing,
testing workloads before moving to production infrastructure.

HPC systems are usually shared. Users do not normally “own” a machine continuously. Instead, they request resources for a job: number of nodes, CPUs, GPUs, memory, and runtime. The job scheduler then starts the job when the requested resources are available.

This is different from opening a cloud console and starting a virtual machine immediately. It requires a different workflow, but it also makes it possible for many users to share very large systems fairly and efficiently.

2. Why HPC matters for AI

Many AI use cases can start with a managed model API, a small open-source model, or a no-code workflow. HPC becomes relevant when the workload grows in one of several directions.

The first direction is model size. Large language models and vision models may require GPUs with large memory, or multiple GPUs working together. Fine-tuning a model, even with efficient techniques, may still require hardware that is not available locally.

The second direction is data size. A RAG system for a small policy collection can be indexed on a laptop. A system that processes millions of documents, images, logs, or scientific files may need much more storage throughput and parallel processing.

The third direction is experiment scale. AI projects often require trying different models, hyperparameters, prompts, chunking strategies, retrieval settings, or training configurations. HPC makes it possible to run many experiments in parallel or to run larger experiments than a single workstation can handle.

The fourth direction is combining AI with simulation. In engineering, climate, energy, medicine, materials, and manufacturing, AI may be used together with simulation workloads that already belong naturally on HPC systems.

The fifth direction is European capability and sovereignty. For some organizations, access to European compute capacity, European support networks, and public innovation infrastructure is strategically important.

3. EuroHPC and AI Factories: shared infrastructure plus support

EuroHPC JU is a European initiative involving the EU, European countries, and private partners to develop a world-class supercomputing ecosystem. Its AI Factories initiative extends this idea toward AI, offering AI-optimised HPC resources, support services, training, and technical expertise. EuroHPC describes the AI Factories as one-stop-shop ecosystems for European AI innovation, including support for startups and SMEs. :contentReference[oaicite:2]{index=2}

For SMEs and startups, this is important because advanced AI infrastructure is usually hard to access. Buying GPUs is expensive. Cloud GPU capacity can be costly or limited. Hiring infrastructure specialists is difficult. Publicly supported access paths can therefore reduce the barrier to serious experimentation.

The EuroHPC AI Factory Industrial Innovation access modes are designed for different levels of need:

Playground access: intended for SMEs, startups, and entry-level users who need rapid, limited access to test technologies. EuroHPC describes Playground access as offering rapid FIFO access, including access within two working days and onboarding services for new users. :contentReference[oaicite:3]{index=3}
Fast Lane access: intended for users already familiar with HPC and requiring up to 50,000 GPU hours. :contentReference[oaicite:4]{index=4}
Large Scale access: intended for AI models and applications requiring more than 50,000 GPU hours. :contentReference[oaicite:5]{index=5}

This staged structure is useful pedagogically. It shows that HPC access is not only for the largest research projects. There are entry-level routes for exploration, larger routes for serious development, and large-scale routes for ambitious AI workloads.

4. What an AI Factory adds beyond hardware

For many SMEs, hardware is not the only barrier. The harder barriers are knowing what to ask for, how to prepare data, how to run jobs, how to choose models, how to structure experiments, and how to avoid wasting compute.

This is why AI Factories and onboarding initiatives matter. AI:AT’s HPC onboarding material, for example, frames onboarding as a way to help SMEs use Europe’s shared HPC infrastructure for prototyping AI models, fine-tuning existing models, and experimenting with available resources. :contentReference[oaicite:6]{index=6}

Useful support services may include:

help choosing whether HPC is the right fit,
guidance on access routes and applications,
account setup and login support,
training on job submission and filesystems,
advice on software environments and containers,
support for model fine-tuning and inference workflows,
help with scaling from one GPU to multiple GPUs or nodes,
performance and resource-efficiency guidance,
connections to the broader AI ecosystem.

This support is especially important because the first barrier for SMEs is often not lack of ambition. It is uncertainty: “Can we even use this system?”, “What should we request?”, “How do we move data?”, “How do we run a notebook?”, “How do we avoid wasting allocation?”.

5. HPC is not the same as cloud

Cloud and HPC can both provide compute, but they usually feel different to users.

In a cloud environment, a user often starts an instance, connects to it, installs software, and keeps it running as long as needed. Billing is usually usage-based, and many managed services are available.

In an HPC environment, a user typically logs into a login node, prepares files and scripts, and submits jobs to a scheduler such as Slurm. The scheduler starts the job on compute nodes when resources are available. Users may also work with modules, shared filesystems, quotas, batch queues, project accounts, and different partitions.

This difference matters because HPC rewards preparation. A well-prepared job can use many GPUs efficiently. A poorly prepared job may wait in the queue, fail quickly, or waste valuable resources.

The HPC mindset is therefore:

prepare data before the job starts,
define the required resources explicitly,
use batch scripts for repeatability,
write logs and outputs to known locations,
test small before scaling up,
monitor resource usage and efficiency,
respect shared-resource etiquette.

For teams used to interactive cloud instances, this may require adjustment. But it also teaches good habits: reproducibility, resource awareness, and disciplined experimentation.

6. Leonardo as an example of an HPC system

CINECA’s Leonardo system is a useful example because it is one of Europe’s major supercomputing systems and is commonly used for AI and scientific workloads. CINECA’s documentation describes Leonardo as a system with different modules, including Booster and DCGP resources, managed through Slurm partitions and queues. :contentReference[oaicite:7]{index=7}

For learners, the exact hardware details are less important than the pattern:

there are login nodes for access and preparation,
there are compute nodes for running jobs,
there are partitions for different types of resources,
there are quality-of-service and walltime limits,
there are shared filesystems for data and results,
jobs are submitted through a scheduler,
resource requests affect scheduling and accounting.

This is the core HPC operating model. Users do not simply run heavy workloads on the login node. They use the login node to prepare and submit jobs, and the heavy computation happens on allocated compute nodes.

Understanding this distinction is essential. Many beginners make the mistake of treating an HPC login node like a personal server. On shared systems, login nodes are for lightweight preparation, not heavy training or inference.

7. Typical AI workloads that fit HPC

HPC is most useful when the workload is substantial enough to benefit from advanced resources. Good AI candidates include:

Fine-tuning large models: especially when one GPU is not enough or when many experiments are required.
Distributed training: training across multiple GPUs or nodes with frameworks such as PyTorch Distributed, DeepSpeed, FSDP, or Accelerate.
Large-scale inference: running batch inference over many documents, images, or records.
Embedding generation: creating embeddings for large document collections or multimodal datasets.
Computer vision workloads: training or fine-tuning image models on large image collections.
Synthetic data generation: generating large numbers of examples, images, scenarios, or simulations.
Benchmarking: comparing models, prompts, retrieval settings, or scaling strategies.
Simulation plus AI: combining traditional HPC simulation with ML surrogates, parameter studies, or AI-assisted analysis.

HPC is usually less necessary for simple prompt engineering, low-volume model API use, small RAG prototypes, or ordinary office automation. Those can often start with managed tools, local development, or cloud services.

Practical rule: Use HPC when the workload benefits from parallel compute, large memory, high-throughput storage, specialised GPUs, or large experiment scale. Do not use HPC merely because the project involves AI.

8. The first HPC challenge: making the workload ready

A common SME mistake is to think of HPC access as the main challenge. Access matters, but the workload must also be ready.

A workload is HPC-ready when:

the goal is clear,
input data is prepared,
data can be moved or staged efficiently,
software dependencies are known,
the code can run non-interactively,
resource requirements are estimated,
small tests have been run before large jobs,
outputs and logs are written predictably,
the team knows how success will be evaluated.

If a project is still at the stage of “we have many files somewhere and maybe want to train something”, HPC may not yet be the next step. The next step may be data preparation, scoping, a small local test, or a prototype.

HPC amplifies both good and bad preparation. A well-designed job scales. A messy job fails at scale.

9. Data movement and storage on HPC

AI workloads often involve large data: model checkpoints, datasets, image collections, embeddings, logs, or generated outputs. On HPC, data movement and storage strategy are therefore central.

Teams should ask:

Where is the data now?
How large is it?
How many files are involved?
Does the workload read many small files or fewer large files?
Where should data be staged for fast access during jobs?
What outputs must be kept?
What can be deleted after the experiment?
Are there quotas or retention rules?
Does sensitive data need special handling?

Data layout can have a major performance impact. Many tiny files may overload metadata operations. Large sequential reads may perform better. Temporary scratch storage may be appropriate for intermediate data, while important results should be copied to more durable storage.

For RAG or document-processing workloads, the same issue appears differently. It may be inefficient to repeatedly parse millions of documents. It may be better to preprocess, deduplicate, chunk, and store intermediate representations before running large embedding jobs.

10. Software environments: reproducibility matters

HPC systems usually provide shared software modules, compilers, libraries, Python environments, and container options. Users may create Conda environments, virtual environments, Apptainer/Singularity containers, or other reproducible setups.

For AI, environment management is often one of the hardest practical issues. PyTorch, CUDA, Transformers, tokenizers, datasets, acceleration libraries, and model-serving tools must work together with the installed drivers and hardware.

Good practice includes:

documenting the environment setup,
pinning important package versions,
using containers for complex or portable workflows,
testing environment creation before the training day,
separating installation steps from production job execution,
avoiding hidden dependencies in one user’s home directory,
recording model, dataset, and code versions.

Reproducible environments are not only a technical nicety. They allow teams to rerun experiments, debug failures, hand over work, and use compute allocations efficiently.

11. Running jobs: from notebook thinking to batch thinking

Many AI learners begin in notebooks. Notebooks are excellent for exploration, but large HPC workloads usually need batch jobs. A batch job is a script submitted to the scheduler. It requests resources, loads the environment, runs the program, and writes logs.

This shift matters because batch jobs are repeatable. Instead of clicking cells manually, the team can define exactly what should run:

which code version,
which dataset,
which model,
which parameters,
which resources,
which output directory,
which log files.

A good batch workflow often follows this pattern:

develop interactively on a small sample,
turn the workflow into a script,
test the script on a small job,
increase resources gradually,
monitor logs and resource usage,
save outputs and experiment metadata.

This is a cultural shift for many teams. But it is also one of the reasons HPC encourages stronger engineering discipline.

12. Scaling: one GPU is not automatically four times slower than four GPUs

A common assumption is that adding GPUs or nodes automatically makes AI training faster. In practice, scaling depends on communication, batch size, model size, data loading, synchronization, and how well the software uses the hardware.

Multi-GPU and multi-node training introduces overhead. GPUs must exchange gradients, synchronize parameters, or coordinate model shards. If the workload is too small, the communication overhead may outweigh the benefit. If data loading is slow, GPUs may sit idle. If the batch size is not adjusted, training dynamics may change.

Scaling questions include:

Does the model fit on one GPU?
Is the workload compute-bound or data-bound?
Can the batch size increase?
Is storage fast enough?
How much communication happens between GPUs?
Does the framework support the chosen parallelism strategy?
Does adding GPUs improve time-to-result enough to justify the resources?

For SMEs, the practical rule is to test scaling incrementally. Start with one GPU, then one node, then multiple nodes if needed. Measure performance at each step. Do not request the largest allocation before proving that the code can use it efficiently.

13. Cost and allocation responsibility

EuroHPC AI Factory industrial innovation access modes may be free of charge for eligible AI SMEs and startups, but free access does not mean resources are unlimited or should be used casually. :contentReference[oaicite:8]{index=8} Shared infrastructure depends on responsible usage.

Responsible HPC use means:

requesting appropriate resources,
testing small before scaling,
stopping failed runs quickly,
using debug queues where available,
avoiding idle GPUs,
cleaning up unnecessary intermediate files,
recording what was learned from each run,
reporting outcomes where required.

This matters because the scarce resource is not only hardware. It is also queue time, support time, storage, and the opportunity for other users to run their workloads.

14. When HPC is the wrong choice

HPC is powerful, but it is not always the right answer.

HPC may be the wrong first choice when:

the use case is still vague,
the data is not prepared,
the workload can run easily on a laptop or small cloud instance,
the team only needs a managed model API,
the main problem is data quality, not compute,
the team cannot yet run the code reproducibly,
the workload requires always-on low-latency production serving,
the team has no one able to learn the HPC workflow.

In these cases, a simpler environment may be better. HPC should be introduced when the compute need is real and the project is ready to benefit from it.

This is not a limitation of HPC. It is good infrastructure strategy: match the tool to the workload.

15. A practical readiness checklist

Before applying for or using HPC resources, a startup or SME can use this checklist:

Use case: Can we explain the AI workload and expected outcome?
Compute need: Why do we need HPC rather than a laptop, cloud API, or small GPU?
Data: Is the data prepared, legally usable, and transferable?
Code: Can the code run on a small sample?
Environment: Are dependencies documented or containerized?
Resources: Can we estimate GPUs, CPUs, memory, storage, and runtime?
Outputs: Do we know what artefacts, metrics, and logs we need?
Evaluation: How will we judge whether the run was successful?
Ownership: Who will run, monitor, and interpret the jobs?
Next step: What will we do with the results?

If several answers are missing, the team may need a preparation sprint before requesting larger resources.

16. What this means for decision makers

Decision makers do not need to know every scheduler command or hardware detail. But they should understand when HPC is strategically useful and what readiness means.

Good questions include:

Is our bottleneck actually compute?
What workload would we run on HPC?
How much data must move?
Do we have the skills or support to run jobs?
What will we learn from the allocation?
Can this project start with Playground access, or does it need Fast Lane or larger access?
Will the results support a product, pilot, model, benchmark, or learning goal?
How will we avoid wasting shared resources?

These questions keep HPC grounded in business and technical value. HPC is not a badge of sophistication. It is a resource to use when it shortens time-to-result, enables experiments otherwise impossible, or gives access to compute capacity that the organization could not reasonably build itself.

Bottom line: HPC and EuroHPC are powerful options for AI workloads that need substantial compute, memory, storage throughput, or experiment scale. For startups and SMEs, the key is to use them deliberately: prepare the workload, start small, scale responsibly, and rely on onboarding and AI Factory support where needed. HPC is not the goal — it is an accelerator when the use case is ready for it.

🧩 Task

Interactive task: Choose the AI activity where HPC would be most relevant for you. You’ll get a practical interpretation.

Module 7 · Section 6

Infrastructure builder

Use this simplified builder to create a realistic infrastructure stance and see whether it looks balanced, fragile, or overbuilt.

Deep Dive: Building an infrastructure plan from real constraints

After discussing data architecture, hosting models, cloud versus hybrid choices, and HPC, the practical question becomes: how do you turn all of this into a concrete infrastructure plan?

For startups and SMEs, the answer should not be a long technology wish list. A useful infrastructure plan starts with the AI use case, the data involved, the required reliability, the available skills, the budget, and the next stage of maturity. The goal is to choose the smallest infrastructure foundation that supports the next safe and valuable step.

This section is called an “infrastructure builder” because it helps learners combine the previous decisions into one practical view. It is not asking, “What is the perfect architecture?” It is asking, “Given our current AI goal, what infrastructure choices are reasonable, what risks must we control, and what should we postpone until there is evidence?”

A good infrastructure plan is not the most complete diagram. It is a staged plan that connects business value, data readiness, compute needs, governance, skills, and operating responsibility.

1. Start with the AI workload

Infrastructure should be designed around workloads, not around abstract technology categories. A RAG assistant, a demand forecasting model, a no-code document workflow, a fine-tuning job, and a customer-facing AI product do not need the same infrastructure.

A useful first step is to describe the workload in plain language:

Who will use the AI system?
What task or decision does it support?
What data does it need?
Does it answer questions, generate text, extract fields, classify cases, predict values, or trigger actions?
Does it need real-time responses, scheduled batch processing, or occasional experimentation?
What happens if it is wrong, unavailable, slow, or too expensive?

These answers determine the infrastructure path. If the workload is an internal brainstorming assistant, a managed AI tool may be enough. If it is a RAG assistant for confidential documents, access control and knowledge-base governance matter. If it is large-scale fine-tuning, GPU access and reproducible training environments matter. If it is a customer product, deployment, monitoring, rollback, and support become central.

The first principle is therefore: the workload defines the infrastructure, not the other way around.

2. Identify the maturity stage

The same AI idea needs different infrastructure at different maturity stages.

Experiment

In an experiment, the goal is learning. The team wants to know whether the use case is valuable and technically plausible. Infrastructure should be lightweight.

Use small samples of data.
Prefer managed tools or simple development environments.
Track prompts, data samples, and results manually or with lightweight tooling.
Avoid sensitive production data unless controls are already clear.
Do not overbuild deployment, monitoring, or platform layers too early.

Pilot

In a pilot, real users begin to test the system in a controlled setting. Infrastructure must become more reliable and governable.

Define access roles.
Use a curated dataset or document collection.
Version prompts, code, and important configuration.
Collect user feedback.
Track cost, latency, quality, and failure cases.
Define who owns the pilot after launch.

Production

In production, users depend on the system. Infrastructure must support reliability, security, monitoring, rollback, and maintenance.

Use repeatable deployment processes.
Enforce authentication and authorization.
Monitor quality, cost, latency, usage, and errors.
Define incident response and rollback.
Manage logs and sensitive data carefully.
Assign product, technical, and operational ownership.

Many AI initiatives fail because the infrastructure maturity does not match the use-case maturity. Teams either over-engineer experiments or under-engineer production workflows.

Practical rule: Build experiment infrastructure for experiments, pilot infrastructure for pilots, and production infrastructure for production. Do not confuse the three.

3. Choose the data foundation

Every AI infrastructure plan needs a data foundation. This does not always mean a large platform. It means that the team knows where data lives, who owns it, how it is accessed, and how it is kept usable.

For structured analytics or predictive modelling, the foundation may be a warehouse or lakehouse with curated tables. For RAG, it may be an approved document repository with metadata and access control. For computer vision, it may be object storage with image metadata and annotation records. For monitoring, it may be logs, traces, feedback, and evaluation results.

The infrastructure builder should therefore ask:

Is the data structured, semi-structured, unstructured, or mixed?
Is there an authoritative source?
Who owns the data or documents?
Is the data current enough?
Are duplicates, outdated versions, or conflicting definitions present?
What metadata is needed?
What access restrictions apply?
Can the data used for an AI result be reconstructed later?

For startups and SMEs, the best first step is often not a full data platform. It may be a curated dataset, a controlled document collection, a small warehouse, or a simple lake zone with clear owners and metadata.

4. Choose the compute and hosting approach

Once the data foundation is clear, the next question is where the AI workload runs.

A model API or managed cloud service is often best for fast experimentation and early pilots. It reduces operational burden and gives access to strong models quickly. But it requires review of data terms, cost, latency, provider dependency, and fallback behaviour.

Self-hosting becomes relevant when data sensitivity, cost predictability, model control, customization, or sovereignty matter enough to justify the operational responsibility. It requires deployment, monitoring, scaling, security, and model-update processes.

Hybrid infrastructure becomes relevant when different parts of the workload have different requirements. For example:

sensitive data stays in a controlled environment,
general model calls use managed cloud APIs,
large batch jobs run on HPC or temporary GPU capacity,
edge devices run small models near machines or users,
core business systems remain on-premises.

The infrastructure builder should not automatically choose the most powerful hosting model. It should match hosting to the workload:

Use managed APIs when speed and low operational burden matter most.
Use self-hosting when control and data boundaries matter most.
Use batch processing when real-time responses are unnecessary.
Use HPC when the workload benefits from parallel compute or large GPU allocations.
Use hybrid only when it solves a real constraint, not because it sounds flexible.

5. Decide what must be repeatable

Repeatability is one of the biggest differences between a demo and an operational AI capability. A demo can be impressive even if nobody can reproduce it. A real system cannot.

The infrastructure plan should define what must be repeatable:

Can the development environment be recreated?
Can the same dataset or document snapshot be found again?
Can the same prompt version be retrieved?
Can the same model or model endpoint be identified?
Can the training, indexing, or inference job be rerun?
Can a deployment be rolled back?
Can a bad answer be traced to data, prompt, model, retrieval, or tool behaviour?

The answer does not need to be perfect at the experiment stage. But as soon as real users rely on the system, repeatability becomes essential.

Practical mechanisms include:

version control for code, prompts, and configuration,
documented environment setup,
containerized workloads where useful,
dataset and document snapshots,
model and endpoint version records,
experiment tracking,
deployment logs,
evaluation records.

Repeatability is not bureaucracy. It is how teams debug, improve, scale, and hand over AI systems.

6. Define the governance and security baseline

Infrastructure must implement governance, not merely document it. Policies are useful, but the system must enforce access control, logging, data retention, secrets management, and approval where needed.

The governance baseline should cover:

Identity: who is using the system?
Access: which data, documents, tools, models, and outputs may they access?
Data classification: what is public, internal, confidential, personal, or regulated?
Secrets: where are API keys, tokens, and credentials stored?
Logging: what is recorded, where, and for how long?
Human review: which actions require approval?
Deletion and retention: when should data, logs, and outputs be removed?
Auditability: can important decisions or outputs be reconstructed?

For RAG systems, the key principle is to enforce permissions before retrieval. The model should not receive documents the user is not allowed to see. For tool-using systems, the key principle is least privilege. The AI system should only be able to call the tools and actions required for the use case.

A small internal experiment may need only basic rules. A customer-facing or regulated AI workflow needs a much stronger baseline.

7. Plan for monitoring from the beginning

Monitoring is often added too late. Teams build the AI system, show a demo, launch a pilot, and only then ask how to know whether it is working.

The infrastructure builder should include monitoring from the start. The signals depend on the workload, but common indicators include:

number of users and requests,
latency,
cost per request or workflow,
model-call failures,
retrieval failures,
invalid structured outputs,
guardrail triggers,
human override rate,
user feedback,
data freshness,
GPU or compute utilization,
storage growth,
unexpected spikes or anomalies.

Monitoring should support decisions. If latency is high, the team may reduce context size, cache responses, switch models, batch requests, or change hosting. If retrieval quality is poor, the team may improve chunking, metadata, or source selection. If costs rise, the team may need smaller models, fewer model calls, better routing, or batch processing.

A dashboard is only useful if someone owns the response.

8. Consider cost as an architecture constraint

AI infrastructure cost is not only hardware or cloud spend. It includes model API calls, token usage, GPU time, storage, data transfer, logging, monitoring, engineering time, support time, and maintenance.

The infrastructure builder should ask:

What is the expected usage volume?
What is the cost per request, document, job, or user?
What happens at 10x usage?
Are costs predictable or variable?
Are GPUs used efficiently?
Are large documents or long prompts increasing token costs?
Are repeated computations cached?
Are logs and embeddings growing without lifecycle rules?
Is engineering time being spent on infrastructure that could be bought or shared?

For startups and SMEs, cost visibility is more important than perfect cost modelling at the beginning. The team should at least know which parts of the system drive cost and what would happen if usage grows.

9. Match infrastructure to available skills

A theoretically elegant architecture can fail if nobody can operate it. Skills are therefore part of the infrastructure plan.

Managed APIs require less infrastructure skill but still need prompt design, evaluation, data-handling awareness, and cost monitoring. No-code workflows require process ownership, connector awareness, testing, and documentation. Self-hosted models require deployment, GPU, security, monitoring, and operations skills. HPC requires account setup, job scheduling, environment management, data staging, and performance awareness. Hybrid infrastructure requires coordination across several environments.

The infrastructure builder should ask:

Who can build the first version?
Who can review it?
Who can deploy it?
Who can monitor it?
Who can respond when it fails?
Who can update it after three months?
Which skills must be hired, trained, or sourced through partners?

The right architecture is one the organization can realistically operate, not just one that looks good in a diagram.

10. Build, buy, rent, or share

Infrastructure decisions often become build-versus-buy decisions. For AI, the options are broader: build, buy, rent, or share.

Build when the capability is strategic, highly specific, or requires strong internal control.
Buy when a mature product solves a common problem better and faster than you could build it.
Rent when usage is temporary, spiky, or uncertain, such as cloud compute or managed model endpoints.
Share when public or ecosystem infrastructure is available, such as EuroHPC or AI Factory access for suitable workloads.

The mistake is to treat building as inherently superior. Building gives control, but also creates maintenance obligations. Buying gives speed, but also creates dependency. Renting gives flexibility, but may become expensive at scale. Shared infrastructure can unlock advanced compute, but requires onboarding and workload preparation.

A good infrastructure plan chooses deliberately and documents why.

11. Design for change

AI infrastructure should assume change. Models change. Providers change. Costs change. Regulations change. Data sources change. User expectations change. A system that cannot evolve will become expensive to maintain.

Designing for change does not mean abstracting everything. Too much abstraction too early slows teams down. It means preserving optionality in the places that matter.

Practical ways to preserve optionality include:

store important data in controlled systems and open formats where possible,
keep prompts and evaluation sets outside one vendor’s interface,
record model and prompt versions,
avoid unnecessary provider-specific dependencies in early prototypes,
containerize workloads that may need to move,
separate business logic from model-provider calls,
maintain evaluation cases that can compare model or provider changes.

Optionality is not about avoiding all lock-in. It is about avoiding accidental lock-in in the parts of the system that are strategically important.

12. Infrastructure patterns for common SME AI scenarios

Scenario 1: Internal productivity with AI tools

The simplest infrastructure pattern is approved tool access plus usage guidelines.

Enterprise AI tool or managed model access.
Clear rules for sensitive data.
User training and examples.
Human review for important outputs.
Lightweight feedback on useful use cases.

This is appropriate when the system does not integrate deeply with core business systems.

Scenario 2: RAG assistant for internal documents

A RAG assistant needs more structure.

Curated document collection.
Document owners and freshness process.
Metadata and access control.
Embedding and retrieval infrastructure.
Model endpoint or API.
Evaluation questions and retrieval tests.
Logs and user feedback.

Here the knowledge base is part of the infrastructure.

Scenario 3: No-code AI workflow

A no-code workflow needs operational guardrails even if it looks simple.

Approved connectors and credentials.
Defined workflow owner.
Prompt and output-format documentation.
Validation before writing to other systems.
Error handling and alerts.
Periodic review of cost, failures, and usefulness.

The visual builder is only the interface. The workflow still needs governance.

Scenario 4: Fine-tuning or batch inference

A compute-heavy AI workload needs resource planning.

Prepared data and clear evaluation metric.
Reproducible code and environment.
GPU or HPC access where justified.
Job scripts or repeatable pipelines.
Experiment tracking.
Output storage and cleanup rules.
Performance and cost monitoring.

This pattern benefits from disciplined preparation before scaling.

Scenario 5: Customer-facing AI product feature

A customer-facing feature requires production-grade infrastructure.

Authentication and authorization.
Reliable hosting and deployment pipeline.
Input and output validation.
Model and prompt versioning.
Monitoring of quality, cost, latency, and incidents.
Fallbacks and rollback.
Support and escalation process.
Security and privacy review.

This is no longer a technical experiment. It is a product capability.

13. A practical infrastructure builder checklist

A useful infrastructure plan can be created with the following checklist:

Use case: What task, decision, or workflow does the AI support?
Stage: Is this an experiment, pilot, or production system?
Data: What data is needed, where does it live, and who owns it?
Sensitivity: Is the data public, internal, confidential, personal, or regulated?
Architecture: Is the right foundation a warehouse, lake, lakehouse, document store, or something simpler?
Hosting: Should the model run through an API, self-hosted endpoint, cloud platform, on-prem environment, edge, or HPC?
Compute: Is the workload CPU, GPU, storage, network, or latency constrained?
Governance: How are identity, access, secrets, logging, and retention handled?
Repeatability: Can code, prompts, data, models, and deployments be reconstructed?
Monitoring: What quality, cost, latency, usage, and failure signals will be tracked?
Skills: Who can build, operate, monitor, and improve the system?
Next step: What is the smallest safe implementation that produces learning or value?

This checklist is intentionally practical. It helps teams avoid starting with a platform decision before they understand the workload.

14. Common infrastructure builder mistakes

Mistake 1: Choosing tools before defining the workload

Teams may start with a vector database, GPU provider, orchestration framework, or cloud platform before they know what the AI system must actually do. This often leads to wasted effort.

Mistake 2: Treating data access as an afterthought

If data cannot be found, trusted, accessed, or governed, the model choice does not matter. Data readiness is an infrastructure requirement.

Mistake 3: Confusing prototype success with production readiness

A prototype may show value, but production needs authentication, monitoring, cost control, rollback, security, and ownership.

Mistake 4: Overbuilding because large companies do it

Big-tech infrastructure patterns may not fit SME workloads. Start with reasonable scale and realistic needs.

Mistake 5: Underestimating operations

Self-hosted models, hybrid architectures, HPC workflows, and custom platforms all require people who can operate them. If those people do not exist, the architecture is risky.

Mistake 6: Ignoring cost until usage grows

Token usage, storage, GPU hours, logging, embeddings, and data transfer can all become expensive. Cost visibility should start during the pilot.

15. What this means for decision makers

Decision makers do not need to design every technical layer themselves. But they should be able to recognize whether an infrastructure plan is grounded in real constraints.

A good plan should clearly explain:

why this infrastructure is needed now,
what stage of maturity it supports,
what data and compute assumptions it makes,
how sensitive data is protected,
how costs are monitored,
who operates the system,
what happens when the system fails,
what can be changed later without starting over.

If an infrastructure plan cannot answer these questions, it is probably not yet ready. If it answers them in a simple, staged way, it is likely more useful than a sophisticated diagram with unclear ownership.

16. The staged infrastructure roadmap

A practical roadmap for startups and SMEs could look like this:

Stage 1 — Clarify: define the use case, users, data, risk, and success criteria.
Stage 2 — Prototype: use managed tools, small samples, and lightweight tracking.
Stage 3 — Control: add curated data, access rules, prompt/version tracking, and basic evaluation.
Stage 4 — Pilot: add real users, monitoring, feedback, cost tracking, and ownership.
Stage 5 — Operationalize: add deployment discipline, security review, rollback, incident handling, and support.
Stage 6 — Scale or specialize: add self-hosting, hybrid architecture, HPC, or platform components only where the workload justifies them.

This roadmap keeps infrastructure aligned with evidence. It avoids both premature complexity and fragile success.

Bottom line: Infrastructure building is not about assembling every possible AI platform component. It is about matching the use case, data, compute, governance, cost, skills, and maturity stage. Start with the smallest safe foundation, make it repeatable, monitor honestly, and add complexity only when the workload proves that it is needed.

🧩 Task

Interactive task: Build a realistic setup and generate an interpretation of the trade-offs.

Data architecture

Model hosting

Governance posture

Compute strategy

Module 7 · Summary

Key takeaways

What matters most when choosing architecture, hosting, and compute for AI initiatives.

Infrastructure is not the goal — it is the delivery system: good infrastructure makes AI work usable, repeatable, governable, scalable, and changeable. It should reduce friction, not create complexity for its own sake.
Start with the workload, not the technology: a RAG assistant, forecasting model, no-code workflow, fine-tuning job, and customer-facing AI product all need different infrastructure choices.
Match infrastructure to maturity: experiments need lightweight environments; pilots need access control, evaluation, and monitoring; production systems need deployment discipline, security, rollback, ownership, and incident response.
Choose data architecture based on the problem: warehouses are strong for trusted structured reporting, lakes for flexible raw and varied data, lakehouses for combining analytics and AI workloads, and data mesh principles for clarifying ownership across domains.
Hosting is a business decision, not only a technical one: managed model APIs offer speed and low operational burden; self-hosting offers control but requires skills and responsibility; hybrid hosting can balance both but adds coordination complexity.
Cloud, on-premises, HPC, and edge are deployment options: cloud is useful for speed and managed services, on-premises for control and locality, HPC for large-scale compute-heavy workloads, and edge for low-latency or local processing needs.
Hybrid infrastructure should be deliberate: use hybrid when different workloads genuinely need different environments. Avoid accidental hybrid setups where data, identity, monitoring, ownership, and costs become fragmented.
HPC and EuroHPC are accelerators when the workload is ready: they are valuable for large-scale training, fine-tuning, batch inference, embeddings, simulation-plus-AI, benchmarking, and compute-heavy experimentation — not for every AI project by default.
Data movement, storage, and environments matter: AI workloads often fail because files are hard to move, storage is slow, dependencies are fragile, or experiments cannot be reproduced — not because the model is weak.
Governance must be implemented in infrastructure: access control, secrets management, logging, retention, permissions, source traceability, and auditability need technical support, not only written policies.
Cost visibility should start early: track model calls, tokens, storage, embeddings, GPU hours, data transfer, logs, and operations effort before a prototype becomes expensive at scale.
The best infrastructure plan is staged: clarify the use case, prototype simply, add control during pilots, operationalize only when value is proven, and scale or specialize only when the workload justifies it.