Hitachi Content Platform — An Authoritative Architecture Reference

§ 01

What HCP Actually Is

Hitachi Content Platform is an object storage system, and the word system is doing real work in that sentence. HCP is not a bucket-on-a-box. It is a distributed, policy-driven content store that takes responsibility for an object from the moment of ingest through its entire retention lifecycle — protection, placement, indexing, immutability, replication, expiry — and exposes that object through whatever protocol the consuming application happens to speak. The platform's defining claim is durability and governed retention at scale, not raw throughput, and every architectural decision inside it follows from that priority.

In the current Hitachi Vantara portfolio, HCP no longer stands alone. It sits beneath the VSP One umbrella that consolidates block, file, and object behind a single data-platform strategy. Practically, "HCP" today refers to a family rather than a single product:

HCP (classic / G-series)The long-lived, namespace-and-tenant content platform with the full retention, compliance, and metadata feature set.
HCP for Cloud ScaleThe containerized, S3-native scale-out variant built for billions of objects and high-concurrency S3 workloads.
HCP S-SeriesThe dense, erasure-coded capacity back end that classic HCP (and other front ends) tier to.
VSP One ObjectThe current-generation object tier, re-architected on a distributed SQL metadata engine and designed to integrate with QLC-based VSP One Block midrange systems.

These share lineage and protocol surface but differ sharply in internal design. Conflating them is the single most common mistake made by people who have read about HCP but not run it. The rest of this document treats them as the distinct architectures they are.

§ 02

The Logical Object Model

Everything in classic HCP descends from a three-tier logical hierarchy, and understanding it is the price of admission to every administrative decision that follows.

System: The cluster itself — the physical or virtual nodes, the system-level administrator, the global configuration. One system can manage well into the exabyte range and hundreds of billions of objects in a single namespace footprint.
Tenant: An administrative and security boundary. A tenant is a self-contained slice of the system with its own administrators, its own authentication configuration, its own quota, and its own chargeback accounting. Tenants exist so that a service provider — or an internal platform team behaving like one — can hand a business unit genuine autonomy without exposing the system layer. The default tenant exists from day one; everything else is deliberate.
Namespace: The actual object container — the closest analogue to an S3 bucket, but with considerably more governance attached. A namespace owns its own data protection level, its retention mode, its versioning behavior, its protocol enablement, its DPL, its service plan, and its access-control posture. Objects live in namespaces. Policy lives in namespaces. This is where the platform's character is configured.
Object: The stored entity itself, which in HCP is never just the bytes. Every object carries fixed-content data plus system metadata (size, hashes, ingest time, retention state, shred flag) and optional custom metadata — arbitrary XML or other annotations the application attaches at write time. That custom metadata is a first-class citizen: it is indexed, it is queryable, and it is the mechanism by which HCP behaves like a content repository rather than a dumb blob store. Objects can be versioned, placed under retention (WORM), held by legal hold, and marked for secure shredding on disposition.

Retention is the feature that separates HCP from commodity object storage in regulated environments. Enterprise mode and Compliance mode differ in whether even a privileged administrator can shorten a retention period — and HCP's compliance posture has been independently assessed by Cohasset Associates against SEC, FINRA, CFTC, and MiFID II requirements. When an object is under Compliance-mode retention, it cannot be altered or deleted by anyone until the clock expires. That is the whole point.

§ 03

Protocol Surface & Cross-Protocol Access

HCP's unification trick is that it ingests through multiple protocols and then makes every object available through every other enabled protocol on that namespace. Write a file over SMB, read it back as an S3 object, retrieve it again over the native REST API — same object, same metadata, no translation layer the application has to think about.

The classic platform exposes six ingest protocols:

Protocol	Role
HTTP/HTTPS (REST)	The native HCP namespace API — full metadata, custom metadata, versioning, retention control
S3 (HS3)	S3-compatible gateway for cloud-native and backup applications
CIFS / SMB	File-share ingest and access
NFS	File-share ingest and access for UNIX/Linux estates
SMTP	Email ingest — historically significant for archive workloads
WebDAV	Legacy content-management integration

The native REST API is the one to reach for when you actually care about HCP's differentiators, because it is the only protocol that exposes the full metadata, custom-metadata, and retention model. S3 is the one most modern applications will use, and it is generally good enough — but it is a lowest-common-denominator surface, and treating HCP as "just an S3 endpoint" leaves most of the platform's value on the table.

§ 04

The Management API (MAPI)

HCP is administered through MAPI, a RESTful management interface that sits parallel to the data path. Tenant and namespace lifecycle, user and group configuration, replication setup, retention class definition, chargeback and capacity reporting, and the platform's operational metrics are all driven through MAPI rather than through any single console.

This matters operationally because MAPI is what makes HCP observable on your own terms. The cluster will export per-tenant and per-namespace utilization, ingest rates, and capacity figures through MAPI endpoints, and a thin collector — a Python Prometheus textfile exporter scraping the MAPI metrics, labeled per cluster — turns the platform into a first-class citizen of a Grafana fleet view rather than something you check by logging into a vendor UI. Anyone running HCP at fleet scale who is not pulling MAPI into their own time-series stack is leaving the platform half-instrumented.

§ 05

Data Protection: DPL, Erasure Coding & the S-Series Back End

HCP protects data two ways, and the choice between them is the central capacity-versus-resilience trade-off of the platform.

Data Protection Level (DPL) is replication: the platform keeps N full copies of each object across independent failure domains, where DPL ranges from 1 to 4. DPL 2 — two copies — is the conventional baseline. Replication is simple, fast to repair, and expensive in raw capacity, costing 100% overhead per copy.

Erasure coding trades repair simplicity for capacity efficiency. Instead of whole copies, an object's data is split into data fragments plus computed parity fragments distributed across the failure domains, so the system survives the loss of several fragments while consuming far less overhead than replication. This is the dominant mechanism on the dense back end.

That dense back end is the HCP S-Series — the erasure-coded capacity nodes (the S10/S30 lineage and their successors) that present capacity to an HCP front end over an S3-style interface. Internally, an S-Series node is its own storage system: enclosures of high-density drives, a storage-management layer that owns drive state and the logical-disk abstraction, and the driveman control plane through which drives are added, evacuated, and reconciled. The S-Series runs a wide Reed-Solomon code — the RS(20,6) profile spreads each stripe across 26 fragments (20 data, 6 parity), tolerating the concurrent loss of up to six fragments before a stripe is at risk.

Two hard-won operational truths about the S-Series back end are worth stating plainly, because they are invisible until they bite:

Data and metadata fail independently.

A stripe's bytes can be perfectly intact on disk while the metadata that describes how those fragments compose a stripe is damaged or orphaned. Drive-cycling operations are a classic trigger: when a cycled drive returns under a new logical-disk identity, stripe metadata can be left pointing at fragments that no longer answer to the expected identity, breaching the recoverability threshold even though no data was actually lost. This is a metadata-integrity problem wearing a data-loss costume, and it is diagnosed at the driveman and storage-management layers, not at the object API.

The driveman control plane is not infallible under load.

Operations such as admin driveman add can fail transiently when the node is under high CPU pressure — an API timeout, not a hardware fault — and the correct response is retry, not panic. Reading "drive add failed" as "drive is dead" has cost more people more hours than the actual fault rate ever has.

The discipline that falls out of this: when current diagnostic data and the reputation of the component disagree, follow the data. A drive the inventory swears is healthy at the SCSI and driveman layers is healthy, regardless of what an alert is shouting.

§ 06

Metadata Query Engine & the Query API

HCP indexes system and custom metadata into the Metadata Query Engine and exposes it through the Metadata Query API. This is what lets the platform answer "find every object in this namespace ingested before this date, under this retention class, carrying this custom-metadata annotation" without walking the object store. For compliance discovery, analytics pre-filtering, and lifecycle automation, the MQE is the difference between an answer in seconds and a crawl that takes hours — a difference real customers have measured exactly that way. Treat custom metadata as schema you design deliberately at ingest, not as an afterthought, and the query layer rewards you for it.

§ 07

Tiering, Service Plans & Storage Components

HCP separates what protection and placement behavior a namespace gets from where the bytes physically live through the service plan abstraction. A service plan defines the ingest tier, the protection scheme, and the tiering rules that move objects across storage components over their lifetime.

A storage component is a tierable target. The ingest tier is typically fast primary storage; from there, policy can tier objects out to capacity-optimized S-Series, to extended targets such as NFS volumes, or off to public cloud object stores (S3, Azure Blob, Google Cloud Storage) — with HCP retaining the authoritative metadata and the governance posture regardless of where the bytes ultimately rest. This is the mechanism behind HCP's hybrid-cloud story: the platform is the system of record and the policy engine, and commodity capacity — on-prem or in someone else's data center — becomes an interchangeable tier underneath it.

Classic HCP nodes themselves come in two storage topologies worth naming: RAIN (Redundant Array of Independent Nodes), where each node owns internal storage, and SAIN (SAN-Attached Independent Nodes), where nodes attach to external SAN-presented capacity such as a VSP. The choice shapes failure domains, rebuild behavior, and how the cluster grows.

§ 08

Replication Topologies

HCP replication operates on namespaces and supports topologies well beyond simple primary-to-DR:

Active/passive — one writable copy, one standby, the conventional disaster-recovery posture.
Active/active — both sites accept writes, with the platform reconciling, suited to geographically split read/write workloads.
Chain — A replicates to B replicates to C, for staged or fan-through distribution.
Ring — multiple sites in a replication loop for mutual protection.

Replication is granular, configurable per namespace, and metadata-aware — it carries the governance state, not just the bytes. Capacity reporting that ignores replication topology will mislead you, because a replicated object is consuming capacity in more than one place by design; capacity dashboards for an HCP fleet must account for the replication factor per cluster or they will quietly lie about headroom.

§ 09

HCP for Cloud Scale: A Different Animal

HCP for Cloud Scale shares the HCP name and the governance heritage, but its architecture is a clean break from classic HCP. Where classic HCP is a tightly integrated appliance-lineage system, HCP CS is a containerized, scale-out, S3-first platform built for billions of objects and the kind of concurrency that S3-native applications generate.

Its defining structural choice is the separation of metadata from data. A distributed metadata database tracks objects independently of the data services that store the bytes, and a policy engine governs lifecycle across the distributed cluster. Services run as containers and scale horizontally; metadata and data scale on their own axes rather than in lockstep. The S3 surface is the primary citizen here, not one protocol among six.

The operational consequence is that HCP CS behaves like distributed-systems infrastructure — you reason about it in terms of service replicas, metadata-database health, and the MAPI surface that the distributed cluster exposes, including per-cluster metrics that belong in your Prometheus/Grafana fleet view alongside everything else. Choose classic HCP when retention semantics, the full six-protocol surface, and content-repository behavior dominate. Choose HCP CS when S3 concurrency and object count at massive scale dominate.

§ 10

VSP One Object: The Current Generation

VSP One Object is the newest member of the family and the clearest signal of where the portfolio is heading. It re-architects the object tier on a distributed SQL metadata engine — a YugabyteDB foundation, with per-worker memory (on the order of 256 GiB RAM per worker) standing out as the dominant hardware sizing driver, because the metadata engine, not the data path, is the thing you provision the cluster around. It integrates with QLC-based VSP One Block midrange systems and folds in intelligent services such as PII Discovery for automatic identification and governance of sensitive data across stored objects.

The strategic read: Hitachi is moving the object tier off the classic HCP internals and onto a modern distributed-database substrate, while keeping the governance, compliance, and durability promises that made HCP credible in regulated environments in the first place. The branding consolidated under VSP One; the durability-and-retention DNA did not change.

§ 11

Operating HCP at Fleet Scale

Everything above is architecture. The following is what separates someone who has deployed HCP from someone who runs it across a multi-site, multi-petabyte estate:

Instrument through MAPI, not the UI. Pull per-cluster, per-tenant, per-namespace metrics into your own time-series store. The vendor console is for one-off investigation; your Grafana fleet board is for actually knowing the state of the estate.
Account for replication and DPL in capacity math. Raw capacity, usable capacity, and protected capacity are three different numbers, and only the third one tells you how much real headroom you have.
Treat the S-Series as a storage system in its own right. Its drive state, logical-disk identities, and stripe metadata have failure modes that never surface at the object API. Monitor it at its own layer, with its own tooling.
Believe the diagnostic data over the alert and over the reputation. Transient driveman timeouts look like hardware death; healthy-at-the-SCSI-layer drives get blamed for metadata problems they did not cause. The instrument that reads the actual device wins the argument.
Design custom metadata as schema. The Metadata Query Engine is only as useful as the annotations you deliberately attach at ingest. Retrofitting metadata onto an existing namespace is expensive; designing it up front is nearly free.

§ 12

The One-Paragraph Summary

Hitachi Content Platform is a governed object storage system whose entire design serves durability, immutability, and compliant retention at exabyte scale. Its logical model — system, tenant, namespace, object-with-metadata — gives it administrative isolation and content-repository behavior that commodity object stores lack. It ingests through six protocols and serves every object through all of them; it protects data through replication or wide erasure coding on a dense S-Series back end whose data and metadata fail independently; it indexes metadata for query; it tiers across on-prem and cloud under a single policy engine; and it replicates in topologies from simple DR to active/active rings. The family now spans classic HCP, the containerized S3-scale HCP for Cloud Scale, the S-Series capacity back end, and the distributed-SQL VSP One Object — unified under VSP One in branding, unified by the same governance promises in substance. Run it through its management API, trust its diagnostic data over its reputation, and it will hold the line on data nobody can afford to lose.