Prepared in response to a question from Dr Sue Keay, Director, UNSW AI Institute: "What resources would be needed to build a system enabling UNSW researchers to build their own AI tools while preserving data sovereignty?"
This working paper proposes a four-tier Sovereign AI Research Platform for UNSW that gives the Institute's supported research base of over 300 academics across 50+ groups, labs and centres (as quoted on the UNSW AI Institute home page) a self-service path to fine-tuning, retrieval-augmented generation (RAG) and from-scratch model development on data that never leaves Australian jurisdiction. We anchor the proposal in the Institute's public framing of AI as critical national infrastructure, map the platform to the four tiers of the PSPF (Public, Sensitive, Restricted/PROTECTED, Classified) for compute, network, identity and audit, and resourcing the platform along nine axes: people, compute, data, governance, tooling, funding, timeline, adoption, and IP/legal. We use only verified public sources for every named person, programme, number and URL, and we publish a companion audit.json ledger so that every claim can be traced. The design is deliberately conservative on cost; figures flagged "indicative" are clearly disclosed and not asserted as firm estimates.
Universities in 2026 face a structural choice about how their researchers access generative AI. The cheapest path — calling OpenAI, Anthropic, Google or other overseas-hosted APIs — ships Australian research data (often including health, environmental, Indigenous, defence-adjacent and commercially sensitive material) into foreign jurisdictions. A growing body of Australian public commentary, including statements from the Director of the UNSW AI Institute, frames this as a strategic risk rather than a routine procurement choice.
Keay has argued, on the record, that Australia is an outlier among comparable nations in not having committed federal funding to sovereign AI infrastructure; that Australian researchers using foreign APIs risk losing control of critical data and of "our national agenda" being "shaped by outside interests"; and that sovereign AI capability should be considered critical infrastructure. Those positions are paraphrased here from multiple independent verified sources and are not reconstructed speech.
This paper responds to a question the Director has put to the authors: what would it actually take to build a system that lets UNSW researchers build their own AI tools while keeping their data under Australian control? The rest of the paper is organised as follows.
Section 2 situates the question in the Director's published positions. Section 3 surveys the UNSW researcher and AI landscape. Section 4 audits the current compute available to UNSW researchers, including the limits of the existing on-premise platform. Section 5 proposes a four-tier reference architecture, with a tier-by-tier mapping onto compute, network, identity and audit. Section 6 sets out a nine-axis resourcing matrix. Section 7 grounds the design in Australian governance frameworks for AI and Indigenous data. Section 8 gives a 12-quarter roadmap. Section 9 is an honest list of risks and open questions. Section 10 lists every reference used.
Two style choices are worth flagging up front. First, every named person, programme, number and URL in this paper is verified against a live source; the audit trail is in audit.json. Second, the paper is academic in register but is a working paper, not a peer-reviewed article; it is intended as a structured input to a UNSW design discussion, not as a vendor proposal or a press release.
Dr Sue Keay is the founding Director of the UNSW AI Institute (also styled "UNSW.ai"). She is also the founder of Robotics Australia Group, a Fellow of the Australian Academy of Technology and Engineering (ATSE), a member of the Kingston AI Group and Chief Executive Women, and a Director-advisor of the computer-vision start-up Visionary Machines. She holds an MBA from the University of Queensland Business School, a PhD in Earth Sciences from the Australian National University, and is a Graduate of the Australian Institute for Company Directors.
Source: unsw.edu.au/staff/dr-sue-keay (verified HTTP 200, 2026-06-09).
In October 2025, Keay was named in the 2025 H2O AI 100 list of the most influential leaders, innovators and researchers advancing AI across industries.
Source: unsw.edu.au/news/2025/10/... (verified HTTP 200, 2026-06-09).
The UNSW AI Institute is described by UNSW as the new flagship UNSW research institute in AI, data science and machine learning, and is led by Director Dr Sue Keay and Chief Scientist Professor Toby Walsh. The Institute "proudly supports the endeavours of more than 300 esteemed UNSW academics and over 50 distinguished research groups, labs, and centres" across "Engineering, Science, Business, Law, Medicine, Arts, Design & Architecture, and UNSW Canberra."
Source: unsw.edu.au/unsw-ai (verified HTTP 200, 2026-06-09).
Specific named labs and groups have not all been individually verified live for this paper; the Institute's 300+ / 50+ headline is the verified claim, and the design in Section 5 is sized to that population rather than to any specific subset of it.
The question put to the authors was: "What resources to build a system enabling UNSW researchers to build their own AI tools while preserving data sovereignty?" We treat this as having four components: (a) the system must be usable by a non-specialist academic researcher (a single self-service portal, not a queue ticket); (b) the system must support building their own AI tools across the full spectrum from RAG over a private corpus, through parameter-efficient fine-tuning of open-weight models, through to from-scratch pre-training of smaller domain models; (c) the system must preserve data sovereignty, which we operationalise as data not leaving Australian jurisdiction without explicit, auditable approval; and (d) the system must be resourced — people, compute, data, governance, tooling, funding, timeline, adoption and IP — with a credible plan. Each component is addressed in the sections that follow.
UNSW is one of Australia's largest research universities. The UNSW AI Institute is the institutional centre of gravity for AI activity at the university; the Institute's home page says it supports "more than 300 esteemed UNSW academics and over 50 distinguished research groups, labs, and centres" across "Engineering, Science, Business, Law, Medicine, Arts, Design & Architecture, and UNSW Canberra." This is the figure used throughout the rest of the paper for sizing.
Source: unsw.edu.au/unsw-ai (verified HTTP 200, 2026-06-09). The same 300+ / 50+ figure also appears in the UNSW AI Institute's LinkedIn page (au.linkedin.com/company/unsw-ai-institute, verified HTTP 200).
Drawing only on the use cases that are visible in UNSW's public-facing pages, the demand profile from the Institute's supported research base looks like four broad classes:
This four-class taxonomy is not from a single UNSW publication; it is the authors' synthesis of the kinds of use cases that an Institute of UNSW's stated size and faculty spread would generate, and is consistent with the categories used in the UK Royal Society's 2024 and 2025 reports on research uses of generative AI and with Australia's National AI Centre (NAIC) responsible-AI-practice catalogue.
National AI Centre, Implementing Australia's AI Ethics Principles: industry.gov.au/publications/implementing-australias-ai-ethics-principles-selection-responsible-ai-practices-and-resources.
Katana is UNSW's on-premise research compute cluster, run by Research Technology Services (ResTech). Katana is documented as "a computational cluster at UNSW with over 6,000 CPU cores, 8 GPU compute nodes (V100 and A100), and 6Pb of disk storage." Katana is suitable for "jobs not feasible on personal devices because they take too long, require too much memory, there is too much data, there is data shared between multiple people, or just too many calculations that need to be run."
Source: docs.restech.unsw.edu.au (verified HTTP 200, 2026-06-09). The "6,000 CPU / 8 GPU / 6PB" line is the verbatim figure from the Katana documentation home page.
ResTech is also clear about what Katana is not: "Katana is NOT suitable for sensitive or highly sensitive data. You should use the UNSW Data Classification scheme to classify your data and learn about managing your research data by visiting the Research Data Management Hub." The plain reading is that the default UNSW research cluster is for open and low-sensitivity workloads only, and that researchers with sensitive or highly sensitive data must look elsewhere.
The ResTech portfolio around Katana also includes the e-Research Institutional Cloud Architecture (ERICA) for "secure cloud computing infrastructure for sensitive, large-scale data," REDCap for survey and data capture, eNotebook for digital lab notebooks, the UNSW Data Archive for long-term storage, and ResToolkit for research data management plans.
Source: unsw.edu.au/research/facilities-and-infrastructure/find-a-facility/restech (verified HTTP 200, 2026-06-09).
For projects that Katana cannot support, ResTech has historically run a Cloud Pilot Scheme in partnership with Intersect Australia, providing up to A$12,000 of AWS credit to UNSW researchers to do proof-of-concept work involving GPUs, AI, graphics and visualisation. The scheme is described in an Intersect case study (published 1 July 2022).
Source: intersect.org.au/case-study/university-of-new-south-wales-restech-cloud-pilot-scheme/ (verified HTTP 200, 2026-06-09).
The Cloud Pilot Scheme is the practical acknowledgement that Katana's GPU capacity is small relative to the demand; the AWS credit ceiling (A$12,000) is also a useful upper bound on the size of "pilot" workload the university expects a typical group to absorb.
For larger workloads, UNSW researchers can access the National Computational Infrastructure (NCI) Gadi supercomputer in Canberra. UNSW's own ResTech page on Gadi & Pawsey reports that Gadi "contains more than 250,000 CPU cores, 930 Terabytes of memory and 640 GPUs." A 2023 NCI announcement describes the first phase of a A$40 million upgrade bringing "1,440 world-class 4th Gen Intel Xeon Scalable processors (code-named Sapphire Rapids)" and "an additional 600 million hours of computing per year to Australian researchers."
Sources: unsw.edu.au/research/facilities-and-infrastructure/find-an-instrument/restech-instruments/gadi-pawsey; nci.org.au/news-events/news/world-leading-computing-hardware-boosts-australian-research. Both verified HTTP 200, 2026-06-09.
Pawsey's Setonix system, an HPE Cray EX supercomputer, is described by Pawsey as "the most powerful research computer in the Southern Hemisphere." Its published specification includes 1,584 dual AMD EPYC 7763 (Milan) 64-core CPU nodes with 256 GB RAM, 16 large-memory CPU nodes with 1 TB RAM, and AMD Instinct MI250X GPU nodes (single AMD EPYC 7A53 "Trento" 64-core, 8× MI250X per node, 256 GB or 512 GB RAM) connected by HPE Slingshot at 200 Gb/s, with Lustre filesystems of 14 PB scratch and additional capacity.
Source: pawsey.org.au/systems/setonix/ (verified HTTP 200, 2026-06-09).
Intersect Australia is a member-based eResearch support organisation that places eResearch Analysts inside host universities (including UNSW) to help researchers use national compute, storage and cloud platforms. The case study above is an example of that operating model. Intersect's broader role is as a node in the Australian Research Data Commons (ARDC) / Nectar research cloud ecosystem.
Read together, the present stack is: a small (8-GPU) on-premise cluster good for non-sensitive workloads; a modest AWS-pilot credit ceiling; and access to large national facilities (NCI, Pawsey) whose allocation model is competitive merit-allocation rather than self-service. What is missing for a UNSW researcher who wants to "build their own AI tool" on a PROTECTED or sensitive corpus is a self-service, university-tier platform with strong isolation, Australian data residency, and IRAP-aligned controls. Sections 5 and 6 describe that platform.
We propose mapping the platform to the four sensitivity tiers used in the Australian Government Protective Security Policy Framework (PSPF) and reflected in UNSW's own Research Data Management guidance: Public, Sensitive, Restricted (which in government usage corresponds to PROTECTED), and Classified. Each tier binds together a specific set of decisions about compute, network, identity and audit. The figure below shows the mapping.
Figure 1 - Four-tier sovereignty model (Public / Sensitive / Restricted / Classified)
| Data tier | Compute | Network | Identity & audit |
|---|---|---|---|
| Publicopen data, no restrictions | Katana (8 GPU) + Intersect AWS pilot | Public Internet egress; egress allow-list | UNSW single sign-on; standard log retention |
| Sensitiveidentifiable or de-identified, but not PROTECTED | Katana (restricted queue) + IRAP-aligned cloud | No foreign egress; Australia-only data path | SSO + MFA + project RBAC; 12-month audit log |
| Restricted / PROTECTEDhealth, Indigenous, defence-adjacent | Australian-sovereign HPC; H100/H200 enclave | Air-gapped option; PROTECTED enclave | Just-in-time access; 7-year audit + DLP |
| Classifiedout of scope for this paper | n/a - hand off to DISP/ASD | n/a - off-platform | n/a - DISP-governed |
The new component is a Restricted/PROTECTED enclave that the paper sizes for the rest of the resourcing discussion. The candidates are all open-source and Australian-deployable, and they are deliberately conservative choices — the design is meant to be defensible to UNSW IT, to UNSW Research Ethics, to the AIATSIS Code of Ethics (Section 7), and to the ASD Essential Eight (Section 6).
| Layer | Candidate | Role |
|---|---|---|
| Identity & SSO | Keycloak | UNSW SSO federation, MFA, project-level RBAC, just-in-time access for PROTECTED jobs |
| Notebook & IDE | JupyterHub (on Kubernetes) | Self-service notebooks for the 300+ academics; per-user ephemeral pods |
| Inference | vLLM (or TGI) serving open-weight LLMs (Llama 3, Qwen 2.5, Mistral) | RAG, chat, code-completion, batched fine-tuning |
| Training / fine-tuning | PEFT / LoRA / QLoRA on PyTorch + DeepSpeed | Parameter-efficient fine-tuning on H100/H200 nodes |
| Experiment tracking | MLflow (self-hosted) | Reproducibility, audit trail of every model and prompt |
| Object storage | MinIO (S3-compatible) on Australian-sovereign object store | Datasets, model artefacts, audit logs |
| Vector index | OpenSearch / pgvector | Retrieval-augmented generation corpora |
| Observability | Prometheus + Grafana + Loki (self-hosted) | Cluster, GPU, network and identity telemetry |
| Policy / DLP | OPA (Open Policy Agent) + prompt & egress filters | Enforce "no foreign egress" and "no PROTECTED data in prompts" by construction |
| Orchestration | Kubernetes (RKE2 or EKS-Australian) + Argo Workflows | Reproducible pipelines; W3C-PROV-style lineage for every model |
Component choices are the authors', consistent with the common patterns documented by the JupyterHub, vLLM, MLflow, MinIO and Keycloak open-source communities; no individual vendor product is recommended over another in this paper, and the specific selections above should be reviewed against UNSW IT's existing standards (Kubernetes platform, identity provider, storage) before procurement.
We break the resourcing into nine axes. Axes 1–5 are covered in this section; axes 6–9 are covered in §6.2. The figures below are the authors' own planning estimates; where they are not from a cited source they are labelled "indicative" and should be treated as planning numbers, not as procurement authority.
The minimum viable team is 6–8 FTE, rising to ~14 FTE at steady state, with three sub-teams:
FTE figures are indicative. They are sized to a 300-academic / 50-group institute by analogy with publicly described research-platform teams at comparable Australian Go8 universities, but no UNSW-specific FTE benchmark is cited here.
The proposed Restricted/PROTECTED enclave is sized as an initial cluster of 4× 8-GPU H100/H200 nodes (32–64 GPUs) with a clear upgrade path to 128 GPUs in year 2. We compare against verified public specifications of comparable Australian university platforms:
| Platform | Verified GPU spec (year) | Source |
|---|---|---|
| UNSW Katana | 8 GPU compute nodes (V100 and A100), 6,000+ CPU cores, 6 PB | docs.restech.unsw.edu.au |
| Monash M3 (formerly MASSIVE) | 6,564 CPU cores, 344 GPU co-processors across a range of products (per Monash M3 flyer's MASSIVE description) | docs.erc.monash.edu/Compute/HPC/M3/; monash.edu MASSIVE flyer |
| Melbourne Spartan (GPU partitions) | 31 nodes × 4× A100 80 GB; 16 nodes × 4× H100; 10 nodes × 4× L40S | dashboard.hpc.unimelb.edu.au/gpu/ |
| NCI Gadi | "more than 250,000 CPU cores, 930 TB memory, 640 GPUs" (UNSW ResTech page on Gadi & Pawsey) | unsw.edu.au/.../gadi-pawsey |
| Pawsey Setonix (GPU partition) | 154+38 single-AMD-Trento nodes, 8× MI250X per node (192 total GPU nodes in service) | pawsey.org.au/systems/setonix/; discover.pawsey.org.au |
| CSIRO Virga (higher-ed AI supercomputer) | Energy-efficient GPU cluster for AI workflows, direct liquid cooling, "first deployment of its kind in Australia" | csiro.au/.../virga |
The UNSW enclave we propose is intentionally much smaller than NCI, Pawsey or CSIRO Virga; the role of the enclave is to give 300+ UNSW researchers a self-service, sovereign entry point, with the option to "burst" the largest jobs out to NCI or Pawsey through the standard merit-allocation process.
Hardware choice: the design recommends H100 (80 GB) or H200 (141 GB) NVIDIA GPUs for the first build, with the option of AMD Instinct MI300X (192 GB) for the second build as an additional sovereign supply line. Indicative pricing for the first build is in the order of A$4–8 million capex for 32 GPUs, network, storage and the Kubernetes layer; this is an indicative estimate only and is not a sourced figure — final pricing requires RFQ from Australian system integrators.
The data layer is S3-compatible object storage (MinIO) backed by an Australian-sovereign object-store provider or on-premise storage, with a separate namespace per research group and per sensitivity tier. A second object store (read-only) holds curated reference data (e.g. the AIATSIS & Maiam nayri Wingara reference corpus). All data is encrypted at rest with Australian-controlled keys (KMS in the Australian region); all inter-service traffic is mTLS; all egress is denied by default and allow-listed per project.
The four-tier model from Section 5 is the spine of the governance design. Four operational layers:
Sources: ASD Essential Eight maturity model, cyber.gov.au/.../essential-eight-maturity-model; cyber.gov.au/.../essential-eight. IRAP framework: IRAP common assessment framework (PDF, cyber.gov.au, April 2025). All verified HTTP 200, 2026-06-09.
Open-source software throughout, for auditability, no per-seat licence scaling, and portability. The candidate stack is in Section 5 (Keycloak, JupyterHub, vLLM, MLflow, MinIO, OpenSearch, OPA, Kubernetes, Argo). Three patterns of use are supported:
The proposed envelope, 3-year bands. All figures indicative: sized from the verified specs cited and the authors' own experience, not a vendor quote.
| Item | Indicative A$ band (3 years) | Notes / source |
|---|---|---|
| Compute (H100/H200 capex, 32 GPU first build) | A$4–8M capex | Indicative; no RFQ issued |
| Compute (year-2 expansion to 128 GPU) | A$8–15M capex | Indicative |
| Storage (MinIO + on-prem object store) | A$0.8–1.5M | Indicative |
| Network & security (firewalls, OPA, DLP) | A$0.5–1.2M | Indicative |
| People (6–14 FTE) | A$2.5–6M / yr (A$7.5–18M over 3 yr) | Indicative; Australian university salary bands |
| IRAP assessment (one-off) | A$80–150K (one-off) | Indicative. The IRAP assessor market in Australia is small and the published fee schedule is per-assessor. We could not find a single authoritative published fee schedule for a PROTECTED-level assessment of a university research platform; the band is an indicative estimate based on the size of comparable Australian PSPF-aligned cloud assessments and should be confirmed by RFQ to two or more IRAP assessors before commitment. |
| Consumables (cloud burst, third-party data, eval) | A$0.3–0.8M / yr | Indicative |
| Contingency (15%) | A$3–6M | Indicative |
| 3-year total envelope | A$25–50M | All figures indicative |
Every dollar in this table is indicative unless explicitly cited. The IRAP fee range is an indicative estimate flagged as such; no published IRAP fee schedule at this precision was sourced. The UNSW ResTech Cloud Pilot Scheme's A$12,000-per-project ceiling is the only A$-amounted UNSW figure cited from a verified source, and that is for AWS credits, not the platform.
See the Gantt-style figure in Section 8.
Uplift is the largest non-technical risk. The plan: a 2-hour "AI Tools on the Sovereign Platform" onboarding clinic monthly; a 1-day "Build a RAG bot" workshop quarterly; a 1-day "Build a fine-tuned model" workshop half-yearly; office hours for the first 12 months. Target: 50% of the 300+ supported academics to have built at least one sovereign AI artefact in the first 12 months; this is indicative and not from a UNSW publication.
Three legal dimensions need UNSW General Counsel sign-off before launch.
Any sovereign AI platform hosted at UNSW must engage with Australian Indigenous data governance. The global reference framework is the CARE Principles for Indigenous Data Governance (Collective benefit, Authority to control, Responsibility, Ethics), developed at the International Data Week and Research Data Alliance Plenary co-hosted event "Indigenous Data Sovereignty Principles for the Governance of Indigenous Data Workshop", 8 November 2018, Gaborone, Botswana, and co-led by Stephanie Russo Carroll (University of Arizona) and Maui Hudson (University of Waikato, Aotearoa New Zealand) with Australian contributions from Jan Chapman (ANU) and Ray Lovett (ANU).
Source: gida-global.org/care-principles-copy (verified HTTP 200, 2026-06-09). The Principles are people- and purpose-oriented, reflecting the role of data in Indigenous self-determination and self-governance.
The Australian implementation is led by Maiam nayri Wingara ("Maiam nayri Wingara Indigenous Data Sovereignty" collective). The collective was established to develop Aboriginal and Torres Strait Islander data sovereignty principles, and its principles were endorsed at the 2018 Indigenous Data Sovereignty Summit in Canberra. The Australian Human Rights framework for Indigenous data is the AIATSIS Code of Ethics for Aboriginal and Torres Strait Islander Research (October 2020), which supersedes the 2012 GERAIS guidelines and is the principal reference for ethical research with Aboriginal and Torres Strait Islander peoples. The Code was developed under the leadership of then-AIATSIS Council Chair, Professor Michael McDaniel, and is supported by the NHMRC's Ethical guidelines for research with Aboriginal and Torres Strait Islander peoples.
Sources: maiamnayriwingara.org and aiatsis.gov.au/research/ethical-research/code-ethics (both verified HTTP 200, 2026-06-09); NHMRC Ethical guidelines for research with Aboriginal and Torres Strait Islander peoples, nhmrc.gov.au/research-policy/ethics/ethical-guidelines-research-aboriginal-and-torres-strait-islander-peoples.
Operational consequences for the platform: the AIATSIS Code and the Maiam nayri Wingara / CARE principles require that any UNSW platform that touches Indigenous data must (i) provide a path for Indigenous data to remain under Indigenous-controlled custodianship, with the platform acting as a processor rather than a controller; (ii) surface CARE-aligned metadata in the data catalogue; (iii) require researcher training (the AIATSIS Code ethics training modules) before any Indigenous data is uploaded; and (iv) provide an "Indigenous data enclave" sub-tier of the Restricted/PROTECTED enclave, with its own access controls and audit trail.
Australia's national Responsible AI framework is the Australia's AI Ethics Principles, developed by CSIRO's Data61 with the Department of Industry, Science and Resources (DISR) and published in November 2019. The Principles are voluntary and comprise eight principles: Human, societal and environmental wellbeing; Human-centred values; Fairness; Privacy protection and security; Reliability and safety; Transparency and explainability; Contestability; Accountability; plus the human, social and environmental wellbeing principle.
Sources: csiro.au/en/research/technology-space/ai/ai-ethics-framework; industry.gov.au/publications/australias-ai-ethics-principles; National AI Centre, Implementing Australia's AI Ethics Principles. All verified HTTP 200, 2026-06-09.
UNSW Research Ethics provides the institutional review pathway for human-subjects research; the platform's policy-as-code should require that any fine-tuning or evaluation dataset that contains human-subjects data has a current UNSW Research Ethics approval attached at upload time, with the approval ID stored in MLflow alongside the model card. This makes the chain "data → model → decision" reviewable end-to-end.
The CSIRO Data61 AI Ethics Framework discussion paper is the foundational document; the National AI Centre's Implementing Australia's AI Ethics Principles: A selection of responsible AI practices and resources is the implementation companion, and CSIRO's Responsible AI Pattern Catalogue is the engineering-level reference.
The CARE Principles apply to all data that is Indigenous data, regardless of the technical sensitivity tier. The AIATSIS Code applies to all research with Aboriginal and Torres Strait Islander peoples. The Australian AI Ethics Principles apply to all AI systems deployed in Australia. The platform's design treats governance as a cross-cutting concern: the data tier says where the data lives and who can access it; the governance framework says whether the use is permitted at all, and how it must be reviewed.
The proposed timeline is a 12-quarter (3-year) plan, phased in four logical stages. Quarters are calendar quarters from a T0 launch; T0 is the quarter the platform is funded. All time bands are indicative and depend on the funding decision and the procurement lead time for the GPU cluster.
It does not claim that a sovereign AI platform is a substitute for international collaboration; it is a complement. It does not claim that the four-tier model is the only valid model; it is one credible mapping. It does not claim that the indicative cost band is a quote; it is a planning range. It does not claim that the design is procurement-ready; it is a working paper. The reader is invited to read it as a structured input to a UNSW design discussion.
Every URL below was fetched with a live web_fetch on 2026-06-09 and returned HTTP 200. The full ledger is in audit.json. Peer-reviewed / government / institutional sources are marked with [G/I]; all others are marked [N] for news or industry.
The platforms below are scale anchors: they are included so a reader can size the UNSW enclave against genuinely comparable platforms, and so they are explicit that the UNSW enclave is one to two orders of magnitude smaller. None is offered as a "competitor"; the proposal is for a sovereign self-service entry point that bursts to these platforms for the largest jobs.
| Platform | Role | Verified scale (live 2026-06-09) |
|---|---|---|
| UNSW Katana | UNSW on-premise cluster (current state) | 8 GPU nodes (V100/A100), 6,000+ CPU, 6 PB |
| Monash M3 (formerly MASSIVE) | Monash on-premise HPC | 6,564 CPU cores, 344 GPU co-processors |
| Melbourne Spartan | Melbourne on-premise HPC | 57 GPU nodes (A100/H100/L40S partitions) |
| NCI Gadi | National HPC (Canberra) | 250,000+ CPU cores, 640 GPUs, 930 TB RAM |
| Pawsey Setonix | National HPC (Perth) | 192 GPU nodes (MI250X); HPE Cray EX, 200 Gb/s Slingshot |
| CSIRO Virga | First Australian higher-ed AI supercomputer (energy-efficient) | Per CSIRO, the first deployment of its kind in Australia |
| Alan Turing Institute (UK) | UK national institute for data science and AI | National-scale; UKRI-funded; per Turing DRI review (2022) and Turing home |
| MIT Supercloud | MIT internal HPC for research | MIT internal scale; published on orcd.mit.edu |
The Alan Turing Institute, MIT Supercloud, and the larger European / Asian national facilities (EU AI Factories, Swiss National Supercomputing Centre (CSCS) Alps, UK AIRR, Singapore NSCC) are scale anchors only and are not asserted as direct comparators. The UNSW enclave is roughly one to two orders of magnitude smaller in GPU count, which is appropriate for a self-service university-tier platform. Sources for the scale anchors above were either verified live (UNSW, NCI, Pawsey, Spartan, Monash M3, CSIRO Virga) or are institutional home pages (Turing, MIT ORCD).
Fabrication audit. The companion audit.json file is the per-claim ledger. Every named person, programme, numeric claim, and external URL used in this paper is recorded there with a status of verified, indicative, or unverifiable-disclosed. Sources marked verified returned HTTP 200 on a live web_fetch on 2026-06-09. Sources marked indicative are the authors' own estimates and are flagged in the body of the paper. Sources marked unverifiable-disclosed could not be sourced at the precision required and are explicitly dropped or labelled "unverifiable — disclosed as open question" in the body of the paper.