When you think of running code in the browser, you probably think of Binder or Google Colab. If you are at an AWS house, you think of Sagemaker.
That is the visible end of the digital science platform story: a notebook appears in a tab, Python imports something, a chart lands on the screen, and for a moment the whole research infrastructure problem looks solved. If you are fan of a different paradigm, Python can be replaced with R, or a programming language replaced by a chat session which has its side effects as Science/Engineering plans and some code.
Then the real workflow arrives. The data is 100 TB of imagery from a High-Altitude Balloon and it is sitting at the HPC; you will have to wait in the queue for the job to submit, but the proposal deadline is tomorrow and you really need preliminary exploration graphics to illustrate your story, since humans are inherently visual creatures. Do you fake it till you make it using Nano Banana, or do you commit to a fraction of the actual work you are applying for the funding to do in full?
The HPC credentials are institutional. The analysis needs Dask workers near cloud object storage. It will take two hours for Globus to haul the data over to the nearest AWS Region. The output, when the grant is completed, should become a tile service, a STAC collection, a policy brief, a model run, and a thing someone can repeat six months later when the next sensor collection lands after that much needed CMOS and Lens upgrade. For now you really need that plot showing how a thermal mosaic align with the visible spectrum satellite imagery.
At that point the browser notebook is the doorway, not the platform. The doorway opens onto a howling void of needs and unfilled expectations.
The apartment, the hotel room, and the empty block
The continuum feels a bit like the Django versus Flask argument, except measured in satellite scenes, climate arrays, grant budgets, and institutional risk. If you don’t know what Django or Flask is, perhaps opinionated vs minimalistic are more understandable.
A managed science namespace is the Flask or FastAPI option. You get a well-tested core, working access controls, shared storage, a scheduler, a route to scalable compute, and enough freedom to design your own universe. It is like buying an empty apartment in a good part of town. The walls are sound. The plumbing works. The rooms are yours, you will not get robbed if you step outside.
A highly managed platform is closer to a hotel room at the Ritz. Everything is convenient, polished, and serviced. You get a desk, a bed, a view, and a button to press when something breaks. You probably cannot knock down a wall.
Then there is the empty block of land: Kubernetes, object storage, IAM, container registries, workflow engines, data catalogs, and a team with enough scars to make them all behave. Maximum control, maximum blast radius.
The interesting work sits between those extremes. Digital science platforms are not one product category. They are a gradient of convenience, control, reproducibility, and institutional memory.
The first split: interactive science versus production exploitation
Earth observation platforms and broader science platforms grew from different pressures.
Science platforms usually start with researchers. They need notebooks, environments, collaboration spaces, package installation, teaching material, shared compute, and a path from experiment to publication. The center of gravity is interactive work: Jupyter, RStudio, Dask, documentation, executable papers, and a community that can teach the next person how to reproduce the result.
EO exploitation platforms usually start with data gravity. They need catalogs, tiling, pixel access, area-of-interest filtering, atmospheric correction, process graphs, long-running jobs, and application packaging. The center of gravity is operational work: find the scenes, run the process near the archive, publish the output, and make the result portable across platforms.
In practice those two worlds are merging. The Pangeo stack wants cloud-native scientific computing. EOEPCA wants interoperable EO exploitation building blocks. Digital Earth platforms want curated national-scale data cubes. NASA-style platforms such as MAAP and VEDA want science teams to move from discovery to analysis to communication without rebuilding the pipes each time.
Choosing a winner is not the problem. Knowing which layer you are buying is.
A rough map of the ecosystem
The ecosystem makes more sense sorted by the job each platform does rather than by brand.
| Layer | Main job | Examples | What you gain | What you give up |
|---|---|---|---|---|
| Ephemeral notebooks | Launch a runnable environment from code | Binder, repo2docker-backed demos | Low friction, great teaching surface | Weak persistence, weak governance, limited scale |
| Hosted notebooks | Put compute in a managed browser workspace | Google Colab, Kaggle notebooks, SageMaker Studio Lab-style services | Fast start, familiar interface, low admin load | Runtime limits, cloudy provenance, platform-specific behavior |
| Community hubs | Give a research community a managed shared workspace | 2i2c hubs, JupyterHub deployments, Pangeo hubs | Shared identity, shared environments, scalable Dask, community ownership | Still needs platform stewardship and cost discipline |
| Open science practice layers | Teach teams how to work reproducibly | OpenScapes, Project Pythia, Jupyter Book, MyST | Social machinery for reproducible science | Not a compute platform by itself |
| Cloud-native geoscience stacks | Make large arrays workable near object storage | Pangeo, Xarray, Dask, Zarr, Kerchunk, Intake | Open, composable scientific computing | You assemble and operate the stack |
| Data cube and EO libraries | Organize analysis-ready EO collections | Open Data Cube, xcube, stackstac, odc-stac | EO-shaped data abstractions | Needs catalog, storage, and operations around it |
| Managed planetary platforms | Bring data catalog and compute together | Google Earth Engine, Microsoft Planetary Computer | Convenient data proximity and APIs | Platform dependence and varying escape hatches |
| Standards-based EO exploitation | Package processing for portability | openEO, OGC API - Processes, EOEPCA, CWL-style application packages | Repeatable jobs across back ends | More ceremony and specification work |
| Science mission platforms | Align infrastructure to a mission community | NASA MAAP, NASA VEDA, AquaWatch, ESA EarthCODE-style efforts | Domain-specific data, workflows, and communication | Mission boundaries shape what is easy |
| Institutional digital earths | Curate national or sectoral data products | Digital Earth Australia, Africa, Pacific, and related platforms | Trusted local data products, policy relevance | Sustained funding and data governance required |
That table is too tidy, of course. Real platforms straddle rows. A Pangeo hub can be a community hub, a cloud-native geoscience stack, and the working surface for a Digital Earth program. Earth Engine is both a managed planetary platform and a workflow language. EOEPCA is both architecture and open-source building blocks. A Digital Earth platform can host notebooks, APIs, dashboards, STAC catalogs, and products built from Open Data Cube.
The taxonomy still helps because “we need a science platform” can mean a teaching hub, a production EO processing system, a mission workbench, a public communication portal, or a Kubernetes tenancy model for research groups. Those are different purchases.
Pangeo is a community before it is a stack
Pangeo is often described through its software: Xarray for labelled N-dimensional arrays, Dask for parallel compute, Zarr for chunked cloud-friendly storage, Jupyter for interactive work, Kerchunk for virtualizing old file formats, Intake for catalogs, hvPlot for quick visualization, XPublish for serving datasets, and newer bounded-memory or serverless experiments such as Cubed and Xarray-Beam.
What goes into the requirements / constraints file of a notebooks repo is all and good, but like all code its value really depends on the people who read it and run it, Anne Frank’s diary is no good, if it is never found and read. The written word of any form be is a notebook or a painful diary, reaches its true potential when it transmits the ideas contained there in to the minds of thousands of others. Pangeo describes itself as a community for open, reproducible, scalable geoscience. Project Pythia is its education working group and a training hub for the geoscientific Python community. The infrastructure, documentation, showcase talks, working groups, cookbooks, and foundation tutorials are part of the platform. Project Pythia can be found, it has a lot of words, but no concrete infrastructure to run on, no printing press to replicate the proposed patterns and keep publishing them across planning cycles.
Many institutional platform programs miss this. You can install JupyterHub and Dask Gateway in a week. You cannot apt install a culture of reproducible workflows, documented environments, data access patterns, and examples that match the local science questions.
2i2c and OpenScapes occupy this gap. 2i2c turns the hub into managed open infrastructure with access control, custom environments, scalable compute, and a “Right to Replicate” philosophy, which is a useful antidote to the managed-service trap. OpenScapes works on the human operating system: habits, onboarding, documentation, and shared practice. The difference shows up when one champion leaves and the platform either survives or becomes a quiet monthly cloud bill, that everyone questions every quarter and decides to shutdown in a much “killed by Google” fashion. It have shut down a service or two in my time as well.
Earth Engine is the polished hotel room
Google Earth Engine remains the reference point for convenient planetary-scale EO analysis. It combines a multi-petabyte public catalog, server-side geospatial computation, JavaScript and Python APIs, and a browser code editor that made remote sensing feel weirdly immediate.
The trade-off is the same one that makes it attractive. You get the hotel room. You do not own the building.
For many users that is exactly right. If the job is land-cover change detection, trend mapping, rapid prototyping, teaching, or humanitarian analysis, Earth Engine collapses a brutal infrastructure problem into an API, and it has unlocked a large amount of science and public-interest work.
A national agency or regulated enterprise eventually asks harder questions. Can we run this against restricted internal data? Can we reproduce the process outside the platform? Can the cost model survive operational use? Can the workflow be audited? Can the output become an internal service rather than a script in someone else’s editor? What if Google does not like the economics and GEE gets “Killed by Google” ?
Those questions mark the boundary between convenience and control.
Planetary Computer, STAC, and the cloud-native middle
Microsoft Planetary Computer pushed a different shape: open catalogs, STAC metadata, cloud-hosted environmental datasets, Hub-style analysis environments, and APIs that fit the broader Python geospatial ecosystem. It did not only put data in the cloud; it made the catalog and assets legible to ordinary tools.
That middle ground is where STAC, Cloud Optimized GeoTIFF, Zarr, GeoParquet, and object storage conventions earn their keep. STAC calls itself a common language for geospatial information, with a core JSON structure for describing and cataloguing spatiotemporal assets. That sounds dull until you have inherited five satellite archives, three naming conventions, and one research assistant who left for a postdoc in Bremen.
These are the unglamorous freight standards of cloud-native geospatial work. Once data is published in these shapes, platforms compete on experience and operations instead of trapping users at the file boundary.
The same applies inside Digital Earth programs. Open Data Cube, odc-stac, xcube, stackstac, and Pangeo tooling become more powerful when the data model is boring and inspectable. Boring is good. Boring is how a wet-season flood product, a crop monitoring workflow, and a methane detection experiment can reuse the same infrastructure without pretending they are the same science.
openEO and EOEPCA are the uncomfortable middle
At the other end from free-form notebooks are workflow and process standards.
openEO gives users a unified API for Earth observation cloud back ends. It is a process graph world: define operations, submit work, let the back end execute near the data, and avoid rewriting the same analysis for every platform. In 2026 openEO became an OGC Community Standard, which is a useful signal that the idea has moved beyond a single project vocabulary.
OGC API - Processes handles a related need: wrap computational tasks as executable processes that a server can offer through a JSON-over-HTTP Web API. It is the unromantic contract that lets processing become a service instead of a notebook cell someone has to rerun by memory.
EOEPCA then packages a broader exploitation platform architecture around this world: reusable building blocks, open standards, federation between EO cloud platforms, application packaging, identity, data access, processing, and publication. Its mission is practical: enhance interoperability between cloud-based EO platforms and unify the fragmented cloud ecosystem for ground segment, EO science, R&D, and applications.
This middle is uncomfortable because it asks scientists to describe workflows with more ceremony than a notebook while asking infrastructure teams to expose more flexibility than a fixed portal. That ceremony earns its keep only when portability, audit, repeatability, and multi-platform execution are real requirements.
MAAP, VEDA, and AquaWatch show mission-shaped patterns
NASA’s Multi-Mission Algorithm and Analysis Platform (MAAP), Visualization, Exploration, and Data Analysis (VEDA), and AquaWatch point at another category: mission-shaped science platforms.
MAAP is built around collaborative algorithm development and analysis for biomass, carbon, and related mission data. It combines data, algorithms, and cloud computation for Earth science work at scale, including GEDI, ICESat-2, BIOMASS, NISAR, and related mission products. Its centre is not a generic notebook. It is a community of practice around specific data products, algorithms, and validation work.
VEDA is more communication and exploration oriented: science data, dashboards, stories, APIs, and applications that help users move from dataset to visible public insight. Its public dashboard talks about cloud-enabled analysis without writing code or downloading data, plus stories for science communication. In the geospatial agent stack draft I have been calling this the presentation and evidence layer: not just compute, but a way to package outputs for humans without losing provenance.
AquaWatch is the mission model closer to my own recent work: water quality intelligence rather than a generic EO workbench. Its platform shape is dictated by the problem. Satellite products, in-situ observations, calibration and validation workflows, data services, state-government users, and operational water decisions all have to sit in one fabric. Nobody wants an elegant notebook if the harmful algal bloom window has already closed and the river manager is still waiting for a plot.
Mission platforms should not be embarrassed by their specificity. A platform that knows its science domain can make strong defaults: data already mounted, examples already relevant, units already sensible, common plots already nearby, and metadata that matches the decisions people actually make.
Digital Earth platforms are institutional memory
Digital Earth platforms are a cousin of science platforms, but with more institutional weight.
Digital Earth Australia, Digital Earth Africa, Digital Earth Pacific, and related efforts are not merely places to run notebooks. They are long-running data product factories, national or regional knowledge systems, and trust infrastructure. They turn raw satellite archives into analysis-ready data and derived products that policy, environment, agriculture, water, and disaster teams can use.
The local flavours carry the point. Digital Earth Australia talks about trusted imagery and freely available analysis-ready datasets for researchers, land managers, agriculture, and emergency management. Digital Earth Africa frames the work as decision-ready products for sustainable growth across social, environmental, and economic challenges. Digital Earth Pacific is explicitly about decision-making for Pacific peoples at EO scale, with regional products, dashboards, an analytical hub, data, community, and governance in the same frame. This is not just a different skin on the same JupyterHub.
If you need a global flavour you can check out EASI from CSIRO under the AquaWatch program delivery, it flips the regional data cube approach to provide a global datacube from STAC using cloud native primitives. Almost a build your own Google Earth Engine approach, with associated DevOps and Platform Engineering burdens which you have to pay for to get.
That is a different job from a research notebook service. The value is not that one scientist can compute something, but that many people can rely on the same corrected, versioned, documented product line. Open Data Cube sits under many of these programs as an open-source way to manage analysis-ready data and has been used for continental-scale products such as Australian land-cover mapping from decades of Landsat imagery.
Here the apartment analogy changes. A Digital Earth platform is not an apartment you furnish. It is an apartment building with strata rules, fire exits, a maintenance schedule, and residents who will complain if the lift stops working, and I have been a super on such apartment buildings across 3 platforms now.
What must be portable?
Most platform debates get stuck on user interface preference. Notebooks versus portals. Python versus JavaScript. Managed service versus Kubernetes. Proprietary convenience versus open stack purity. The relevant debate is around what must be portable.
If only the result is needed, a polished managed platform may be perfect. If the method must move between agencies, clouds, and missions, then STAC, openEO, OGC API - Processes, CWL-style packaging, and open data formats become non-negotiable. If the community must outlive one grant, then docs, examples, governance, and training are platform features. If restricted data is involved, then identity, audit, and network controls are part of the science platform, not enterprise up-sell (don’t get me started on the SSO premium most SaaS platforms charge).
There are at least five kinds of portability hiding inside this one word:
- Data portability: can assets move or be read by standard tools?
- Workflow portability: can the analysis run on a different back end?
- Environment portability: can the software stack be rebuilt?
- Evidence portability: can another person inspect how the result was produced?
- Community portability: can another team inherit the practice without oral tradition?
The best platform choice depends on which of those you actually need.
Agents make the old platform boundary fuzzier
Agentic tooling is the next complication.
Once agents can call a STAC API, resolve a place name, submit an openEO job, monitor an Argo Workflow, query a Digital Atlas, open a notebook, and draft a VEDA-style story, the visible platform recedes behind the contracts underneath it.
Portals do not disappear. The durable value moves down into services, catalogs, workflow engines, identity boundaries, and audit trails, and the UI becomes one client among many.
This is where EO and enterprise AI rhyme. A user should be able to ask a normal question:
Show me whether wet-season surface water around this catchment is outside the last ten-year range, and give me the data sources and caveats.
The system behind that question might need a gazetteer, a STAC catalog, a data cube, Dask workers, an openEO process graph, a workflow engine, a notebook artifact, and a short written summary. Calling all of that a chatbot misses the point. It is a digital science platform with a language-shaped front door.
Science Platform advantage
There is a less glamorous question hiding under the agent story: who operates and pays for the tooling and compute fabric the agents call?
If a user asks the wet-season surface-water question and an agent fans out across a slew of tools, somebody owns every link in that chain. Somebody pays for object storage requests, scheduler time, network egress, GPU or CPU minutes, model tokens, human review, and the maintenance of the small apis that make the orchestration possible.
A serious science platform needs weight propagation in both directions. Backward propagation attributes each output to the tools, datasets, people, grants, teams, and compute fabric it consumed. Forward propagation estimates what the next run will cost before the button is pressed. Without both, the presenter of the agentic tooling with customer touch point gets all the glory and gets to keep all the revenue, while the sustaining ecosystem underneath withers on the vine and the agentic tool ends up being a toy in the long run.
This is where the phrase “Science Platform advantage” starts to be useful, in the same way people talk about quantum advantage. The claim is not that a platform is clever. The claim is that a connected, orchestrated chain of task-specific tooling, assembled on demand and cost-tracked properly, delivers more value than doing the same work any other way.
For the utilitarians in the room, that claim needs a benchmark. Pose a real problem. Solve it fully the old way first: portal clicks, downloads, hand-written scripts, email threads, queue waits, manual plots, confused provenance, and the final slide dropped into a proposal at 11:47 pm. Then build the agentic way and run the same class of problem a thousand times, so the cost of building the new fabric is amortized across many analyses rather than charged to the first heroic demo.
The trouble is that no one was keeping clean accounts while we did it the old way. We rarely know how much the portal-clicking cost once staff time, failed runs, duplicate downloads, local storage, interpretation drift, and meetings are counted. So we cannot honestly promise cheaper or faster without measuring both sides. We can be more certain about the energy bill: silicon currently delivers far less useful scientific reasoning per watt than the human brain fed on caffeine and croissants.
That does not weaken the platform argument. It makes the accounting part of the platform. The useful question is not “can an agent do this?” The useful question is whether the platform can tell us what it did, who paid, who benefited, and whether the next thousand runs beat the old way by enough to justify the extra silicon. Will the environmental solution we are building preserve the environment or become yet another nail in the anthropogenic coffin we are putting around our planet.
My current bias
I lean toward the apartment model.
Give science teams a managed namespace with identity, storage, standard catalogs, scalable compute, observability, and a small number of blessed paths to production. Keep the walls movable. Let the team bring Pangeo, Open Data Cube, xcube, DuckDB, GDAL, QGIS plugins, or specialist models where the science demands it. Make the durable contracts boring: STAC, COG, Zarr, GeoParquet, openEO, OGC APIs, container images, reproducible environments, and documented workflows.
Use the hotel room when speed beats ownership. Use the empty block only when the institution is ready to become a platform operator. Everything else is a furnishing choice.
EO platforms and science platforms are not rival tribes. They are different answers to the same infrastructure question: how much freedom do you need, how much responsibility can you carry, and what parts of the science must still make sense when the browser tab is closed?
Sources and threads to pull next
- Pangeo and Project Pythia for the open, reproducible, scalable geoscience community and its education layer.
- 2i2c and OpenScapes for managed open infrastructure, community hubs, right-to-replicate thinking, and reproducible research practice.
- STAC, OGC API - Processes, openEO, and EOEPCA for catalog, process, workflow, and platform-interoperability contracts.
- Google Earth Engine as the canonical managed planetary-scale EO analysis environment.
- Microsoft Planetary Computer as the open-catalog, cloud-hosted environmental data, and scientific API pattern.
- NASA MAAP, NASA VEDA, and AquaWatch as mission-shaped platform patterns around collaboration, analysis, visualization, communication, and operational water-quality intelligence.
- Digital Earth Australia, Digital Earth Africa, Digital Earth Pacific, CSIRO EASI, and Open Data Cube for institutional data product platforms and analysis-ready data infrastructure.

No comments:
Post a Comment