Sources - Penumbra

Sources are the provenance layer of the graph. Every entity traces back to a source: a document, a page, a web clip, a data feed. You register source material, extract it through a shape to coerce it into typed entities, then read those entities back scoped to the source they came from.

register a source ──▶ extract through a shape ──▶ typed entities in the graph
   pb.sources.register      pb.extract                 pb.sources.entities

Extraction is a lens, not a one-shot import. The same source can be extracted through different shapes to surface different structure, because a source is just material and a shape is the perspective you read it with.

pb.sources

Method	Description
`pb.sources.register(input)`	Register source material so it can be extracted.
`pb.sources.get(id)`	Read a source.
`pb.sources.entities(id)`	The entities extracted from this source.
`pb.sources.entityStats(id)`	Counts of what was extracted, including per-shape.

register

Register a source. For files, upload the content to storage first, then register the reference; for text, pass it directly. The returned source id is what you extract and read against.

const source = await pb.sources.register({
  type: "document",
  name: "Acme MSA.pdf",
  // ...source reference or content, per your upload flow
});

entities and entityStats

Read what extraction produced, scoped to the source. This is the graph entry point: from a source you can see exactly which entities it grounds.

const entities = await pb.sources.entities(source.id);
const stats = await pb.sources.entityStats(source.id);

console.log(stats); // counts, including a per-shape breakdown

Coercing a source into the graph

Registering material does not put anything in the graph. Extraction does: run the source through a shape with pb.extract, which coerces the unstructured material into entities and relationships that match the shape, staged through a delta for review.

// 1. Register the material.
const source = await pb.sources.register({ type: "document", name: "Acme MSA.pdf" });

// 2. Coerce it into graph structure through a shape (staged).
//    `source` takes the text to extract from; pass the registered source's
//    id as `external_id` to keep the provenance link.
await pb.extract({
  source: { text: documentText, external_id: source.id },
  shapeId: "shp_contract_terms",
  apply: false,
});

// 3. Read what landed, scoped to the source.
const entities = await pb.sources.entities(source.id);

See the ingest a document guide for the full walk through.

​pb.sources

​register

​entities and entityStats

​Coercing a source into the graph

pb.sources

register

entities and entityStats

Coercing a source into the graph