Documents - Penumbra

pb.documents ingests source files and reads what was extracted from them. A document is the developer-facing handle on the source/file substrate; see Sources for the provenance view of the same material.

Ingest a document

upload is the high-level path: it requests a signed URL, uploads the bytes, and registers the file in one call.

const doc = await pb.documents.upload({
  file: bytes,                 // Buffer, Uint8Array, or Blob
  filename: "Acme MSA.pdf",
  contentType: "application/pdf",
});

console.log(doc.id, doc.status); // "...", "active" | "staged"

For lower-level control the steps are exposed individually: createSignedUploadUrl issues the URL and storage path, you upload the bytes yourself, then register records the storage path.

Methods

Method	Description
`pb.documents.upload(input)`	Signed-url upload + register, in one call. Returns the document id and status.
`pb.documents.createSignedUploadUrl(input)`	Issue a signed upload URL and storage path.
`pb.documents.register(input)`	Register an already-uploaded file by storage path.
`pb.documents.create(input)`	Ingest a document from content or a reference.
`pb.documents.search(input)`	RAG search over the document’s source chunks.
`pb.documents.list(input?)`	List documents in the project.
`pb.documents.get(id)`	Read a single document.
`pb.documents.extract(id, input?)`	Run extraction over the document through a shape.

Search documents

const hits = await pb.documents.search({ query: "renewal terms" });

Over REST these are the /v1/documents endpoints: upload corresponds to the signed-url + register flow, and search maps to POST /v1/documents/search.

Sources

The provenance view: read entities scoped to the source they came from.

Capture and extract

Run a document through a shape.

​Ingest a document

​Methods

​Search documents

​Related

Sources

Capture and extract

Ingest a document

Methods

Search documents

Related