Entity Store is a FoundationDB layer that implements a data model for versioned entities with fine-grained authorization and lineage. It is implemented as a Python library exposing its own API above the FoundationDB Python bindings. Versioning of entities is automatic, with modification to immutable fields resulting in a new version rather than mutation. Versions form a parentage tree and can be explicitly selected for use. Read and write authorizations are separately recorded at the level of individual fields (a.k.a “cell-based security.”) Lineage is recorded via labeled, directed multigraphs and used to represent version parentage and other forms of provenance.
Each entity is modeled as a collection of objects that represent its distinct versions, along with a “core” object for non-versioned fields. Objects are schemaless and consist of any number of fields with values. Fields can be single-valued or set-valued, and either case can optionally be indexed. Fields can also have large blob values.
FoundationDB was chosen because it allows rich data models to be flexibly mapped to its Key-Value Store. Required features such as versioning and lineage were straightforward to incorporate into the layer. We rely on FoundationDB’s distributed transactions to allow multiple clients concurrent access to entities without danger of erroneous results or data corruption.
Our use case for Entity Store is as a backend for a metadata manager in a service for data science and machine learning pipelines. These pipelines interact with multiple types of entities, including data sets, notebooks, machine learning models, and experiments. Each of these entities has associated metadata that must be versioned, protected by authorization, and tracked for lineage. The metadata manager uses GraphQL to define schemas for entity types, queries, and mutations.