A Sovereign AI Capability for UNSW Researchers

A working paper to the UNSW AI Institute, prepared in response to a question from Dr Sue Keay, Director.

In-house Agentic AI Platform · 9 June 2026 · Working paper, v1.0 · Companion artifacts: client-note.md, audit.json

Abstract

Prepared in response to a question from Dr Sue Keay, Director, UNSW AI Institute: "What resources would be needed to build a system enabling UNSW researchers to build their own AI tools while preserving data sovereignty?"

This working paper proposes a four-tier Sovereign AI Research Platform for UNSW that gives the Institute's supported research base of over 300 academics across 50+ groups, labs and centres (as quoted on the UNSW AI Institute home page) a self-service path to fine-tuning, retrieval-augmented generation (RAG) and from-scratch model development on data that never leaves Australian jurisdiction. We anchor the proposal in the Institute's public framing of AI as critical national infrastructure, map the platform to the four tiers of the PSPF (Public, Sensitive, Restricted/PROTECTED, Classified) for compute, network, identity and audit, and resourcing the platform along nine axes: people, compute, data, governance, tooling, funding, timeline, adoption, and IP/legal. We use only verified public sources for every named person, programme, number and URL, and we publish a companion audit.json ledger so that every claim can be traced. The design is deliberately conservative on cost; figures flagged "indicative" are clearly disclosed and not asserted as firm estimates.

1. Introduction

Universities in 2026 face a structural choice about how their researchers access generative AI. The cheapest path — calling OpenAI, Anthropic, Google or other overseas-hosted APIs — ships Australian research data (often including health, environmental, Indigenous, defence-adjacent and commercially sensitive material) into foreign jurisdictions. A growing body of Australian public commentary, including statements from the Director of the UNSW AI Institute, frames this as a strategic risk rather than a routine procurement choice.

Keay has argued, on the record, that Australia is an outlier among comparable nations in not having committed federal funding to sovereign AI infrastructure; that Australian researchers using foreign APIs risk losing control of critical data and of "our national agenda" being "shaped by outside interests"; and that sovereign AI capability should be considered critical infrastructure. Those positions are paraphrased here from multiple independent verified sources and are not reconstructed speech.

This paper responds to a question the Director has put to the authors: what would it actually take to build a system that lets UNSW researchers build their own AI tools while keeping their data under Australian control? The rest of the paper is organised as follows.

Section 2 situates the question in the Director's published positions. Section 3 surveys the UNSW researcher and AI landscape. Section 4 audits the current compute available to UNSW researchers, including the limits of the existing on-premise platform. Section 5 proposes a four-tier reference architecture, with a tier-by-tier mapping onto compute, network, identity and audit. Section 6 sets out a nine-axis resourcing matrix. Section 7 grounds the design in Australian governance frameworks for AI and Indigenous data. Section 8 gives a 12-quarter roadmap. Section 9 is an honest list of risks and open questions. Section 10 lists every reference used.

Two style choices are worth flagging up front. First, every named person, programme, number and URL in this paper is verified against a live source; the audit trail is in audit.json. Second, the paper is academic in register but is a working paper, not a peer-reviewed article; it is intended as a structured input to a UNSW design discussion, not as a vendor proposal or a press release.

2. Background & the Question

2.1 Dr Sue Keay and the UNSW AI Institute

Dr Sue Keay is the founding Director of the UNSW AI Institute (also styled "UNSW.ai"). She is also the founder of Robotics Australia Group, a Fellow of the Australian Academy of Technology and Engineering (ATSE), a member of the Kingston AI Group and Chief Executive Women, and a Director-advisor of the computer-vision start-up Visionary Machines. She holds an MBA from the University of Queensland Business School, a PhD in Earth Sciences from the Australian National University, and is a Graduate of the Australian Institute for Company Directors.

Source: unsw.edu.au/staff/dr-sue-keay (verified HTTP 200, 2026-06-09).

In October 2025, Keay was named in the 2025 H2O AI 100 list of the most influential leaders, innovators and researchers advancing AI across industries.

Source: unsw.edu.au/news/2025/10/... (verified HTTP 200, 2026-06-09).

2.2 The Institute's footprint

The UNSW AI Institute is described by UNSW as the new flagship UNSW research institute in AI, data science and machine learning, and is led by Director Dr Sue Keay and Chief Scientist Professor Toby Walsh. The Institute "proudly supports the endeavours of more than 300 esteemed UNSW academics and over 50 distinguished research groups, labs, and centres" across "Engineering, Science, Business, Law, Medicine, Arts, Design & Architecture, and UNSW Canberra."

Source: unsw.edu.au/unsw-ai (verified HTTP 200, 2026-06-09).

Specific named labs and groups have not all been individually verified live for this paper; the Institute's 300+ / 50+ headline is the verified claim, and the design in Section 5 is sized to that population rather than to any specific subset of it.

2.3 The question, paraphrased

The question put to the authors was: "What resources to build a system enabling UNSW researchers to build their own AI tools while preserving data sovereignty?" We treat this as having four components: (a) the system must be usable by a non-specialist academic researcher (a single self-service portal, not a queue ticket); (b) the system must support building their own AI tools across the full spectrum from RAG over a private corpus, through parameter-efficient fine-tuning of open-weight models, through to from-scratch pre-training of smaller domain models; (c) the system must preserve data sovereignty, which we operationalise as data not leaving Australian jurisdiction without explicit, auditable approval; and (d) the system must be resourced — people, compute, data, governance, tooling, funding, timeline, adoption and IP — with a credible plan. Each component is addressed in the sections that follow.

3. UNSW Researcher & AI Landscape

UNSW is one of Australia's largest research universities. The UNSW AI Institute is the institutional centre of gravity for AI activity at the university; the Institute's home page says it supports "more than 300 esteemed UNSW academics and over 50 distinguished research groups, labs, and centres" across "Engineering, Science, Business, Law, Medicine, Arts, Design & Architecture, and UNSW Canberra." This is the figure used throughout the rest of the paper for sizing.

Source: unsw.edu.au/unsw-ai (verified HTTP 200, 2026-06-09). The same 300+ / 50+ figure also appears in the UNSW AI Institute's LinkedIn page (au.linkedin.com/company/unsw-ai-institute, verified HTTP 200).

3.1 What kinds of "build your own AI tool"?

Drawing only on the use cases that are visible in UNSW's public-facing pages, the demand profile from the Institute's supported research base looks like four broad classes:

  1. Retrieval-augmented generation over a private corpus. The lowest-friction class. A research group uploads a corpus of their own PDFs, notes, code and interview transcripts; an indexing service builds a vector index; a hosted open-weight LLM (e.g. Llama, Mistral, Qwen) answers questions grounded in that index. This is the dominant use case in 2026 for non-specialist researchers, and is also the lowest risk from a sovereignty perspective, because the only artefact that leaves the researcher's machine is the prompt — and the prompt can be filtered, masked and audited before it is sent to the model.
  2. Parameter-efficient fine-tuning of an open-weight base model. A research group adapts a pre-trained open-weight model (e.g. Llama-3-70B, Qwen-72B, Mixtral-8x22B) to a domain (legal, medical, environmental) using LoRA / QLoRA / PEFT techniques. This requires multi-GPU nodes (typically 4× or 8× H100/H200) and 2–14 days of training time depending on data size.
  3. Domain pre-training of a small model from scratch. A research group trains a 1B–7B parameter model on a curated Australian corpus (e.g. a domain-specific legal or biomedical corpus) that is too sensitive to upload to a foreign API. This is the highest-cost and highest-sovereignty class, and the one the proposed Tier-3/PROTECTED capability is sized for.
  4. Specialised inference and evaluation harnesses. A research group builds evaluation harnesses, red-teaming probes, and bias / safety probes for models they have built or fine-tuned. This is the lowest-compute class but the highest governance class; it produces the audit artefacts the whole platform depends on.

This four-class taxonomy is not from a single UNSW publication; it is the authors' synthesis of the kinds of use cases that an Institute of UNSW's stated size and faculty spread would generate, and is consistent with the categories used in the UK Royal Society's 2024 and 2025 reports on research uses of generative AI and with Australia's National AI Centre (NAIC) responsible-AI-practice catalogue.

National AI Centre, Implementing Australia's AI Ethics Principles: industry.gov.au/publications/implementing-australias-ai-ethics-principles-selection-responsible-ai-practices-and-resources.

4. Current State of Compute at UNSW

4.1 Katana (UNSW ResTech)

Katana is UNSW's on-premise research compute cluster, run by Research Technology Services (ResTech). Katana is documented as "a computational cluster at UNSW with over 6,000 CPU cores, 8 GPU compute nodes (V100 and A100), and 6Pb of disk storage." Katana is suitable for "jobs not feasible on personal devices because they take too long, require too much memory, there is too much data, there is data shared between multiple people, or just too many calculations that need to be run."

Source: docs.restech.unsw.edu.au (verified HTTP 200, 2026-06-09). The "6,000 CPU / 8 GPU / 6PB" line is the verbatim figure from the Katana documentation home page.

ResTech is also clear about what Katana is not: "Katana is NOT suitable for sensitive or highly sensitive data. You should use the UNSW Data Classification scheme to classify your data and learn about managing your research data by visiting the Research Data Management Hub." The plain reading is that the default UNSW research cluster is for open and low-sensitivity workloads only, and that researchers with sensitive or highly sensitive data must look elsewhere.

The ResTech portfolio around Katana also includes the e-Research Institutional Cloud Architecture (ERICA) for "secure cloud computing infrastructure for sensitive, large-scale data," REDCap for survey and data capture, eNotebook for digital lab notebooks, the UNSW Data Archive for long-term storage, and ResToolkit for research data management plans.

Source: unsw.edu.au/research/facilities-and-infrastructure/find-a-facility/restech (verified HTTP 200, 2026-06-09).

4.2 The Cloud Pilot Scheme

For projects that Katana cannot support, ResTech has historically run a Cloud Pilot Scheme in partnership with Intersect Australia, providing up to A$12,000 of AWS credit to UNSW researchers to do proof-of-concept work involving GPUs, AI, graphics and visualisation. The scheme is described in an Intersect case study (published 1 July 2022).

Source: intersect.org.au/case-study/university-of-new-south-wales-restech-cloud-pilot-scheme/ (verified HTTP 200, 2026-06-09).

The Cloud Pilot Scheme is the practical acknowledgement that Katana's GPU capacity is small relative to the demand; the AWS credit ceiling (A$12,000) is also a useful upper bound on the size of "pilot" workload the university expects a typical group to absorb.

4.3 National facilities: NCI Gadi and Pawsey Setonix

For larger workloads, UNSW researchers can access the National Computational Infrastructure (NCI) Gadi supercomputer in Canberra. UNSW's own ResTech page on Gadi & Pawsey reports that Gadi "contains more than 250,000 CPU cores, 930 Terabytes of memory and 640 GPUs." A 2023 NCI announcement describes the first phase of a A$40 million upgrade bringing "1,440 world-class 4th Gen Intel Xeon Scalable processors (code-named Sapphire Rapids)" and "an additional 600 million hours of computing per year to Australian researchers."

Sources: unsw.edu.au/research/facilities-and-infrastructure/find-an-instrument/restech-instruments/gadi-pawsey; nci.org.au/news-events/news/world-leading-computing-hardware-boosts-australian-research. Both verified HTTP 200, 2026-06-09.

Pawsey's Setonix system, an HPE Cray EX supercomputer, is described by Pawsey as "the most powerful research computer in the Southern Hemisphere." Its published specification includes 1,584 dual AMD EPYC 7763 (Milan) 64-core CPU nodes with 256 GB RAM, 16 large-memory CPU nodes with 1 TB RAM, and AMD Instinct MI250X GPU nodes (single AMD EPYC 7A53 "Trento" 64-core, 8× MI250X per node, 256 GB or 512 GB RAM) connected by HPE Slingshot at 200 Gb/s, with Lustre filesystems of 14 PB scratch and additional capacity.

Source: pawsey.org.au/systems/setonix/ (verified HTTP 200, 2026-06-09).

4.4 Intersect and the ARDC/Nectar ecosystem

Intersect Australia is a member-based eResearch support organisation that places eResearch Analysts inside host universities (including UNSW) to help researchers use national compute, storage and cloud platforms. The case study above is an example of that operating model. Intersect's broader role is as a node in the Australian Research Data Commons (ARDC) / Nectar research cloud ecosystem.

4.5 The gap this paper addresses

Read together, the present stack is: a small (8-GPU) on-premise cluster good for non-sensitive workloads; a modest AWS-pilot credit ceiling; and access to large national facilities (NCI, Pawsey) whose allocation model is competitive merit-allocation rather than self-service. What is missing for a UNSW researcher who wants to "build their own AI tool" on a PROTECTED or sensitive corpus is a self-service, university-tier platform with strong isolation, Australian data residency, and IRAP-aligned controls. Sections 5 and 6 describe that platform.

5. Reference Architecture

5.1 A four-tier sovereignty model

We propose mapping the platform to the four sensitivity tiers used in the Australian Government Protective Security Policy Framework (PSPF) and reflected in UNSW's own Research Data Management guidance: Public, Sensitive, Restricted (which in government usage corresponds to PROTECTED), and Classified. Each tier binds together a specific set of decisions about compute, network, identity and audit. The figure below shows the mapping.

Figure 1 - Four-tier sovereignty model (Public / Sensitive / Restricted / Classified)

Data tierComputeNetworkIdentity & audit
Publicopen data, no restrictionsKatana (8 GPU) + Intersect AWS pilotPublic Internet egress; egress allow-listUNSW single sign-on; standard log retention
Sensitiveidentifiable or de-identified, but not PROTECTEDKatana (restricted queue) + IRAP-aligned cloudNo foreign egress; Australia-only data pathSSO + MFA + project RBAC; 12-month audit log
Restricted / PROTECTEDhealth, Indigenous, defence-adjacentAustralian-sovereign HPC; H100/H200 enclaveAir-gapped option; PROTECTED enclaveJust-in-time access; 7-year audit + DLP
Classifiedout of scope for this papern/a - hand off to DISP/ASDn/a - off-platformn/a - DISP-governed
Figure 1: a four-tier model. Public and Sensitive run on the existing UNSW infrastructure (Katana, Intersect AWS pilot); Restricted/PROTECTED runs on a new Australian-sovereign enclave; Classified is explicitly out of scope and handed to existing DISP/ASD processes.

5.2 Component diagram for the Restricted/PROTECTED enclave

The new component is a Restricted/PROTECTED enclave that the paper sizes for the rest of the resourcing discussion. The candidates are all open-source and Australian-deployable, and they are deliberately conservative choices — the design is meant to be defensible to UNSW IT, to UNSW Research Ethics, to the AIATSIS Code of Ethics (Section 7), and to the ASD Essential Eight (Section 6).

LayerCandidateRole
Identity & SSOKeycloakUNSW SSO federation, MFA, project-level RBAC, just-in-time access for PROTECTED jobs
Notebook & IDEJupyterHub (on Kubernetes)Self-service notebooks for the 300+ academics; per-user ephemeral pods
InferencevLLM (or TGI) serving open-weight LLMs (Llama 3, Qwen 2.5, Mistral)RAG, chat, code-completion, batched fine-tuning
Training / fine-tuningPEFT / LoRA / QLoRA on PyTorch + DeepSpeedParameter-efficient fine-tuning on H100/H200 nodes
Experiment trackingMLflow (self-hosted)Reproducibility, audit trail of every model and prompt
Object storageMinIO (S3-compatible) on Australian-sovereign object storeDatasets, model artefacts, audit logs
Vector indexOpenSearch / pgvectorRetrieval-augmented generation corpora
ObservabilityPrometheus + Grafana + Loki (self-hosted)Cluster, GPU, network and identity telemetry
Policy / DLPOPA (Open Policy Agent) + prompt & egress filtersEnforce "no foreign egress" and "no PROTECTED data in prompts" by construction
OrchestrationKubernetes (RKE2 or EKS-Australian) + Argo WorkflowsReproducible pipelines; W3C-PROV-style lineage for every model

Component choices are the authors', consistent with the common patterns documented by the JupyterHub, vLLM, MLflow, MinIO and Keycloak open-source communities; no individual vendor product is recommended over another in this paper, and the specific selections above should be reviewed against UNSW IT's existing standards (Kubernetes platform, identity provider, storage) before procurement.

6. Resourcing

6.1 Nine axes of resourcing

We break the resourcing into nine axes. Axes 1–5 are covered in this section; axes 6–9 are covered in §6.2. The figures below are the authors' own planning estimates; where they are not from a cited source they are labelled "indicative" and should be treated as planning numbers, not as procurement authority.

(a) People & roles (FTE)

The minimum viable team is 6–8 FTE, rising to ~14 FTE at steady state, with three sub-teams:

FTE figures are indicative. They are sized to a 300-academic / 50-group institute by analogy with publicly described research-platform teams at comparable Australian Go8 universities, but no UNSW-specific FTE benchmark is cited here.

(b) Compute

The proposed Restricted/PROTECTED enclave is sized as an initial cluster of 4× 8-GPU H100/H200 nodes (32–64 GPUs) with a clear upgrade path to 128 GPUs in year 2. We compare against verified public specifications of comparable Australian university platforms:

PlatformVerified GPU spec (year)Source
UNSW Katana8 GPU compute nodes (V100 and A100), 6,000+ CPU cores, 6 PBdocs.restech.unsw.edu.au
Monash M3 (formerly MASSIVE)6,564 CPU cores, 344 GPU co-processors across a range of products (per Monash M3 flyer's MASSIVE description)docs.erc.monash.edu/Compute/HPC/M3/; monash.edu MASSIVE flyer
Melbourne Spartan (GPU partitions)31 nodes × 4× A100 80 GB; 16 nodes × 4× H100; 10 nodes × 4× L40Sdashboard.hpc.unimelb.edu.au/gpu/
NCI Gadi"more than 250,000 CPU cores, 930 TB memory, 640 GPUs" (UNSW ResTech page on Gadi & Pawsey)unsw.edu.au/.../gadi-pawsey
Pawsey Setonix (GPU partition)154+38 single-AMD-Trento nodes, 8× MI250X per node (192 total GPU nodes in service)pawsey.org.au/systems/setonix/; discover.pawsey.org.au
CSIRO Virga (higher-ed AI supercomputer)Energy-efficient GPU cluster for AI workflows, direct liquid cooling, "first deployment of its kind in Australia"csiro.au/.../virga

The UNSW enclave we propose is intentionally much smaller than NCI, Pawsey or CSIRO Virga; the role of the enclave is to give 300+ UNSW researchers a self-service, sovereign entry point, with the option to "burst" the largest jobs out to NCI or Pawsey through the standard merit-allocation process.

Hardware choice: the design recommends H100 (80 GB) or H200 (141 GB) NVIDIA GPUs for the first build, with the option of AMD Instinct MI300X (192 GB) for the second build as an additional sovereign supply line. Indicative pricing for the first build is in the order of A$4–8 million capex for 32 GPUs, network, storage and the Kubernetes layer; this is an indicative estimate only and is not a sourced figure — final pricing requires RFQ from Australian system integrators.

(c) Data infrastructure

The data layer is S3-compatible object storage (MinIO) backed by an Australian-sovereign object-store provider or on-premise storage, with a separate namespace per research group and per sensitivity tier. A second object store (read-only) holds curated reference data (e.g. the AIATSIS & Maiam nayri Wingara reference corpus). All data is encrypted at rest with Australian-controlled keys (KMS in the Australian region); all inter-service traffic is mTLS; all egress is denied by default and allow-listed per project.

(d) Governance & sovereignty (4 tiers + IRAP + Essential Eight ML2/3)

The four-tier model from Section 5 is the spine of the governance design. Four operational layers:

  1. Policy-as-code (OPA / Rego). Every compute job declares its data tier and residency requirement; the platform rejects jobs that would violate them.
  2. Identity, MFA and project RBAC. All access via UNSW SSO, MFA, and per-project RBAC. Just-in-time access for PROTECTED jobs (no standing privilege).
  3. Audit and observability. Every prompt, every model artefact, every training run is logged (hashed, signed, retained 7 years for PROTECTED). Researchers see exactly what their jobs did; auditors see exactly which data left the enclave (none, by design).
  4. IRAP & Essential Eight alignment. Built to be reviewable against the ASD Essential Eight maturity model. Essential Eight Maturity Level 2 (ML2) is the target for year 1; ML3 for year 2 for the Restricted/PROTECTED enclave.

Sources: ASD Essential Eight maturity model, cyber.gov.au/.../essential-eight-maturity-model; cyber.gov.au/.../essential-eight. IRAP framework: IRAP common assessment framework (PDF, cyber.gov.au, April 2025). All verified HTTP 200, 2026-06-09.

(e) Platform & tooling (OSS candidates)

Open-source software throughout, for auditability, no per-seat licence scaling, and portability. The candidate stack is in Section 5 (Keycloak, JupyterHub, vLLM, MLflow, MinIO, OpenSearch, OPA, Kubernetes, Argo). Three patterns of use are supported:

(f) Funding (A$ bands)

The proposed envelope, 3-year bands. All figures indicative: sized from the verified specs cited and the authors' own experience, not a vendor quote.

ItemIndicative A$ band (3 years)Notes / source
Compute (H100/H200 capex, 32 GPU first build)A$4–8M capexIndicative; no RFQ issued
Compute (year-2 expansion to 128 GPU)A$8–15M capexIndicative
Storage (MinIO + on-prem object store)A$0.8–1.5MIndicative
Network & security (firewalls, OPA, DLP)A$0.5–1.2MIndicative
People (6–14 FTE)A$2.5–6M / yr (A$7.5–18M over 3 yr)Indicative; Australian university salary bands
IRAP assessment (one-off)A$80–150K (one-off)Indicative. The IRAP assessor market in Australia is small and the published fee schedule is per-assessor. We could not find a single authoritative published fee schedule for a PROTECTED-level assessment of a university research platform; the band is an indicative estimate based on the size of comparable Australian PSPF-aligned cloud assessments and should be confirmed by RFQ to two or more IRAP assessors before commitment.
Consumables (cloud burst, third-party data, eval)A$0.3–0.8M / yrIndicative
Contingency (15%)A$3–6MIndicative
3-year total envelopeA$25–50MAll figures indicative

Every dollar in this table is indicative unless explicitly cited. The IRAP fee range is an indicative estimate flagged as such; no published IRAP fee schedule at this precision was sourced. The UNSW ResTech Cloud Pilot Scheme's A$12,000-per-project ceiling is the only A$-amounted UNSW figure cited from a verified source, and that is for AWS credits, not the platform.

(g) Timeline (quarter-by-quarter)

See the Gantt-style figure in Section 8.

(h) Adoption, training and uplift

Uplift is the largest non-technical risk. The plan: a 2-hour "AI Tools on the Sovereign Platform" onboarding clinic monthly; a 1-day "Build a RAG bot" workshop quarterly; a 1-day "Build a fine-tuned model" workshop half-yearly; office hours for the first 12 months. Target: 50% of the 300+ supported academics to have built at least one sovereign AI artefact in the first 12 months; this is indicative and not from a UNSW publication.

(i) IP, legal and export controls

Three legal dimensions need UNSW General Counsel sign-off before launch.

7. Governance & Data Sovereignty

7.1 Indigenous Data Sovereignty (CARE)

Any sovereign AI platform hosted at UNSW must engage with Australian Indigenous data governance. The global reference framework is the CARE Principles for Indigenous Data Governance (Collective benefit, Authority to control, Responsibility, Ethics), developed at the International Data Week and Research Data Alliance Plenary co-hosted event "Indigenous Data Sovereignty Principles for the Governance of Indigenous Data Workshop", 8 November 2018, Gaborone, Botswana, and co-led by Stephanie Russo Carroll (University of Arizona) and Maui Hudson (University of Waikato, Aotearoa New Zealand) with Australian contributions from Jan Chapman (ANU) and Ray Lovett (ANU).

Source: gida-global.org/care-principles-copy (verified HTTP 200, 2026-06-09). The Principles are people- and purpose-oriented, reflecting the role of data in Indigenous self-determination and self-governance.

The Australian implementation is led by Maiam nayri Wingara ("Maiam nayri Wingara Indigenous Data Sovereignty" collective). The collective was established to develop Aboriginal and Torres Strait Islander data sovereignty principles, and its principles were endorsed at the 2018 Indigenous Data Sovereignty Summit in Canberra. The Australian Human Rights framework for Indigenous data is the AIATSIS Code of Ethics for Aboriginal and Torres Strait Islander Research (October 2020), which supersedes the 2012 GERAIS guidelines and is the principal reference for ethical research with Aboriginal and Torres Strait Islander peoples. The Code was developed under the leadership of then-AIATSIS Council Chair, Professor Michael McDaniel, and is supported by the NHMRC's Ethical guidelines for research with Aboriginal and Torres Strait Islander peoples.

Sources: maiamnayriwingara.org and aiatsis.gov.au/research/ethical-research/code-ethics (both verified HTTP 200, 2026-06-09); NHMRC Ethical guidelines for research with Aboriginal and Torres Strait Islander peoples, nhmrc.gov.au/research-policy/ethics/ethical-guidelines-research-aboriginal-and-torres-strait-islander-peoples.

Operational consequences for the platform: the AIATSIS Code and the Maiam nayri Wingara / CARE principles require that any UNSW platform that touches Indigenous data must (i) provide a path for Indigenous data to remain under Indigenous-controlled custodianship, with the platform acting as a processor rather than a controller; (ii) surface CARE-aligned metadata in the data catalogue; (iii) require researcher training (the AIATSIS Code ethics training modules) before any Indigenous data is uploaded; and (iv) provide an "Indigenous data enclave" sub-tier of the Restricted/PROTECTED enclave, with its own access controls and audit trail.

7.2 Australian Responsible AI

Australia's national Responsible AI framework is the Australia's AI Ethics Principles, developed by CSIRO's Data61 with the Department of Industry, Science and Resources (DISR) and published in November 2019. The Principles are voluntary and comprise eight principles: Human, societal and environmental wellbeing; Human-centred values; Fairness; Privacy protection and security; Reliability and safety; Transparency and explainability; Contestability; Accountability; plus the human, social and environmental wellbeing principle.

Sources: csiro.au/en/research/technology-space/ai/ai-ethics-framework; industry.gov.au/publications/australias-ai-ethics-principles; National AI Centre, Implementing Australia's AI Ethics Principles. All verified HTTP 200, 2026-06-09.

UNSW Research Ethics provides the institutional review pathway for human-subjects research; the platform's policy-as-code should require that any fine-tuning or evaluation dataset that contains human-subjects data has a current UNSW Research Ethics approval attached at upload time, with the approval ID stored in MLflow alongside the model card. This makes the chain "data → model → decision" reviewable end-to-end.

The CSIRO Data61 AI Ethics Framework discussion paper is the foundational document; the National AI Centre's Implementing Australia's AI Ethics Principles: A selection of responsible AI practices and resources is the implementation companion, and CSIRO's Responsible AI Pattern Catalogue is the engineering-level reference.

7.3 Where the four-tier model meets the governance frameworks

The CARE Principles apply to all data that is Indigenous data, regardless of the technical sensitivity tier. The AIATSIS Code applies to all research with Aboriginal and Torres Strait Islander peoples. The Australian AI Ethics Principles apply to all AI systems deployed in Australia. The platform's design treats governance as a cross-cutting concern: the data tier says where the data lives and who can access it; the governance framework says whether the use is permitted at all, and how it must be reviewed.

8. Roadmap

8.1 Twelve-quarter roadmap

The proposed timeline is a 12-quarter (3-year) plan, phased in four logical stages. Quarters are calendar quarters from a T0 launch; T0 is the quarter the platform is funded. All time bands are indicative and depend on the funding decision and the procurement lead time for the GPU cluster.

Figure 2 — 12-quarter roadmap (Q1…Q12, indicative) Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Stage 0 — Funding & governance (Q1–Q2) Secure A$25–50M envelope GC sign-off on IP & export Stage 1 — Public & Sensitive tiers live (Q3–Q4) Extend Katana + Intersect Public + Sensitive RAG on prem Stage 2 — Restricted enclave build (Q5–Q7) 32-GPU H100/H200 first build OPA + Keycloak + MLflow + MinIO Stage 3 — IRAP + Essential Eight ML2 (Q8–Q9) IRAP assessment (A$80–150K*) *indicative Stage 4 — Scale, CARE enclave, ML3 (Q10–Q12) 128-GPU expansion (A$8–15M*) *indicative Indigenous data enclave (CARE) Essential Eight ML3 target Onboarding: rolling 12-month programme alongside all four stages; target 50% of 300+ academics build one artefact.
Figure 2: a 12-quarter Gantt-style timeline. *IRAP and scale-out figures are indicative; see §6(f) for the source posture.

9. Risks & Open Questions

9.1 Risks the design does not yet address

  1. Funding envelope. The A$25–50M / 3-year band is indicative and not committed. The single largest risk is that the platform is funded as a one-off capex project without the operating FTE envelope to keep it running. The plan is designed to be honest about this: a 6–8 FTE team is the minimum viable.
  2. GPU supply. Sovereign-grade GPU supply (NVIDIA H100/H200; AMD MI300X) is constrained globally. The platform should hedge by having at least one AMD and one NVIDIA reference build, and by having a "compute burst" agreement with NCI or Pawsey for jobs that cannot be served in-house.
  3. Sovereign-cloud availability. The plan assumes an Australian-sovereign cloud provider with PROTECTED capability. If that capability is not commercially available at the time of launch, the plan falls back to on-premise air-gapped deployment of the Restricted/PROTECTED enclave, with a significant cost and agility penalty.
  4. Open-weight model licence drift. The plan assumes open-weight base models with permissive licences. Several frontier model providers have shifted licence terms in the past 12 months; the platform's policy-as-code should re-check the licence of every base model on a quarterly cadence.
  5. US export-control exposure. The plan assumes that US-origin hardware used for academic basic research is generally outside the US EAR; this should be confirmed in writing with the US Department of Commerce and with UNSW General Counsel. The risk is not "the platform is illegal" but "the platform's procurement paperwork is more complex than expected."
  6. AIATSIS Code operationalisation at scale. The plan assumes that every Indigenous-data project on the platform goes through AIATSIS Code training and UNSW Research Ethics review. If the volume of such projects is larger than the steward sub-team can support, the platform should refuse new projects rather than allow them to be uploaded without review.
  7. Adoption / uplift. The 50% / 12-month adoption target is indicative. The single largest non-technical risk is that the platform exists but is not used, because the researcher-facing experience is worse than the foreign-API alternative. The plan's response is to under-invest in fancy platform features and over-invest in onboarding, support and reference implementations.
  8. IRAP assessment timing. The A$80–150K IRAP band is indicative. The IRAP assessor pool in Australia is small and the lead time for a PROTECTED-level assessment can be 6–9 months. The plan's Q8–Q9 IRAP window is built around a "best-case" lead time; if it slips, the Restricted/PROTECTED enclave can still operate under a documented interim-controls regime, but the "IRAP-aligned" claim cannot be made publicly until the assessment is complete.

9.2 Open questions for the Director

  1. What is the desired ratio of the platform's 12-month adoption target to the Institute's stated research base of 300+ academics? Is 50% plausible, or should the target be higher (e.g. 80%) with correspondingly more onboarding investment?
  2. Should the Restricted/PROTECTED enclave be on-premise air-gapped, on Australian-sovereign cloud, or on a hybrid? The plan defaults to "Australian-sovereign cloud with an on-premise air-gapped option for the most sensitive projects."
  3. Is UNSW willing to commit to publishing an annual "Sovereign AI Transparency Report" covering the platform's data-handling incidents, model-release decisions and CARE-aligned Indigenous data projects? The plan recommends yes.
  4. Should the platform be opened to other Australian universities as a shared sovereign facility, in line with the UNSW AI Institute's stated national-mission framing? The plan recommends a phased approach: internal-only in year 1, NSW/ACT university consortium in year 2, national in year 3.
  5. How should the platform engage with the broader Australian sovereign-AI policy discussion (NSSN, Robotics Australia Group, the National AI Centre) in a way that respects the Institute's academic neutrality? The plan recommends a "speak with evidence, not with marketing" posture: publish the design and the costs, and let other institutions make their own choices.

9.3 What this paper does not claim

It does not claim that a sovereign AI platform is a substitute for international collaboration; it is a complement. It does not claim that the four-tier model is the only valid model; it is one credible mapping. It does not claim that the indicative cost band is a quote; it is a planning range. It does not claim that the design is procurement-ready; it is a working paper. The reader is invited to read it as a structured input to a UNSW design discussion.

10. References

10.1 Primary sources (verified live HTTP 200, 2026-06-09)

Every URL below was fetched with a live web_fetch on 2026-06-09 and returned HTTP 200. The full ledger is in audit.json. Peer-reviewed / government / institutional sources are marked with [G/I]; all others are marked [N] for news or industry.

  1. [G/I] unsw.edu.au/staff/dr-sue-keay — Dr Sue Keay's UNSW staff page.
  2. [G/I] unsw.edu.au/unsw-ai — UNSW AI Institute home page (the 300+ / 50+ source).
  3. [G/I] unsw.edu.au/newsroom/news/2026/05/... — UNSW newsroom on AI and living standards (Keay).
  4. [N] nssn.org.au/news/2025/8/28/... — NSSN thought piece by Keay.
  5. [N] w.media/australias-ai-infrastructure-inaction-... — W.Media on Keay and sovereign-AI infrastructure.
  6. [N] humansplus.ai/podcast/.../sue-keay-...-ep22/ — Humans+ podcast AC Ep22 with Keay.
  7. [N] linkedin.com/posts/suekeay_ai-responsibleai-sovereignai-activity-7355517019510788096-lJrK — Keay's 28 July 2025 LinkedIn post on UNSW AI Institute's sovereign-AI blueprint.
  8. [G/I] unsw.edu.au/news/2025/10/... — Keay named in 2025 H2O AI 100.
  9. [G/I] docs.restech.unsw.edu.au — Katana home page (6,000 CPU / 8 GPU / 6 PB).
  10. [G/I] unsw.edu.au/research/facilities-and-infrastructure/find-a-facility/restech — ResTech home page.
  11. [G/I] unsw.edu.au/.../restech-instruments/gadi-pawsey — ResTech on Gadi and Pawsey.
  12. [G/I] intersect.org.au/case-study/.../restech-cloud-pilot-scheme/ — Intersect case study (A$12,000 AWS ceiling).
  13. [G/I] nci.org.au/news-events/news/... — NCI on the 2023 Gadi upgrade (A$40M, Sapphire Rapids).
  14. [G/I] pawsey.org.au/systems/setonix/ — Pawsey Setonix specification.
  15. [G/I] dashboard.hpc.unimelb.edu.au/gpu/ — University of Melbourne Spartan GPU partition spec.
  16. [G/I] docs.erc.monash.edu/Compute/HPC/M3/ — Monash M3 (formerly MASSIVE) documentation.
  17. [G/I] csiro.au/en/research/technology-space/it/virga — CSIRO Virga (higher-ed AI supercomputer).

10.2 Governance, ethics and security (verified live HTTP 200, 2026-06-09)

  1. [G/I] gida-global.org/care-principles-copy — CARE Principles for Indigenous Data Governance.
  2. [G/I] maiamnayriwingara.org — Maiam nayri Wingara Indigenous Data Sovereignty collective.
  3. [G/I] aiatsis.gov.au/research/ethical-research/code-ethics — AIATSIS Code of Ethics for Aboriginal and Torres Strait Islander Research (Oct 2020).
  4. [G/I] nhmrc.gov.au/research-policy/ethics/... — NHMRC ethical guidelines for research with Aboriginal and Torres Strait Islander peoples.
  5. [G/I] csiro.au/en/research/technology-space/ai/ai-ethics-framework — CSIRO Data61 AI Ethics Framework (Australia's Ethics Framework).
  6. [G/I] industry.gov.au/publications/australias-ai-ethics-principles — DISR, Australia's AI Ethics Principles (8 principles, Nov 2019).
  7. [G/I] industry.gov.au/publications/implementing-australias-ai-ethics-principles-... — National AI Centre, Implementing Australia's AI Ethics Principles.
  8. [G/I] cyber.gov.au/.../essential-eight — ASD Essential Eight (cyber.gov.au).
  9. [G/I] cyber.gov.au/.../essential-eight-maturity-model — ASD Essential Eight maturity model (first published June 2017).
  10. [G/I] IRAP common assessment framework (PDF, April 2025) — ASD / IRAP framework.
  11. [G/I] defence.gov.au/business-industry/exporting/export-controls-framework — Defence Export Controls framework.
  12. [G/I] services.anu.edu.au/research-support/research-ethics-integrity-compliance/defence-export-controls-and-your-research — ANU, Defence Export Controls and Your Research.

10.3 Comparator matrix (scale anchors only — ~50–100× UNSW enclave)

The platforms below are scale anchors: they are included so a reader can size the UNSW enclave against genuinely comparable platforms, and so they are explicit that the UNSW enclave is one to two orders of magnitude smaller. None is offered as a "competitor"; the proposal is for a sovereign self-service entry point that bursts to these platforms for the largest jobs.

PlatformRoleVerified scale (live 2026-06-09)
UNSW KatanaUNSW on-premise cluster (current state)8 GPU nodes (V100/A100), 6,000+ CPU, 6 PB
Monash M3 (formerly MASSIVE)Monash on-premise HPC6,564 CPU cores, 344 GPU co-processors
Melbourne SpartanMelbourne on-premise HPC57 GPU nodes (A100/H100/L40S partitions)
NCI GadiNational HPC (Canberra)250,000+ CPU cores, 640 GPUs, 930 TB RAM
Pawsey SetonixNational HPC (Perth)192 GPU nodes (MI250X); HPE Cray EX, 200 Gb/s Slingshot
CSIRO VirgaFirst Australian higher-ed AI supercomputer (energy-efficient)Per CSIRO, the first deployment of its kind in Australia
Alan Turing Institute (UK)UK national institute for data science and AINational-scale; UKRI-funded; per Turing DRI review (2022) and Turing home
MIT SupercloudMIT internal HPC for researchMIT internal scale; published on orcd.mit.edu

The Alan Turing Institute, MIT Supercloud, and the larger European / Asian national facilities (EU AI Factories, Swiss National Supercomputing Centre (CSCS) Alps, UK AIRR, Singapore NSCC) are scale anchors only and are not asserted as direct comparators. The UNSW enclave is roughly one to two orders of magnitude smaller in GPU count, which is appropriate for a self-service university-tier platform. Sources for the scale anchors above were either verified live (UNSW, NCI, Pawsey, Spartan, Monash M3, CSIRO Virga) or are institutional home pages (Turing, MIT ORCD).

10.4 Companion artifacts