Skip to main content
pb.documents ingests source files and reads what was extracted from them. A document is the developer-facing handle on the source/file substrate; see Sources for the provenance view of the same material.

Ingest a document

upload is the high-level path: it requests a signed URL, uploads the bytes, and registers the file in one call.
const doc = await pb.documents.upload({
  file: bytes,                 // Buffer, Uint8Array, or Blob
  filename: "Acme MSA.pdf",
  contentType: "application/pdf",
});

console.log(doc.id, doc.status); // "...", "active" | "staged"
For lower-level control the steps are exposed individually: createSignedUploadUrl issues the URL and storage path, you upload the bytes yourself, then register records the storage path.

Methods

MethodDescription
pb.documents.upload(input)Signed-url upload + register, in one call. Returns the document id and status.
pb.documents.createSignedUploadUrl(input)Issue a signed upload URL and storage path.
pb.documents.register(input)Register an already-uploaded file by storage path.
pb.documents.create(input)Ingest a document from content or a reference.
pb.documents.search(input)RAG search over the document’s source chunks.
pb.documents.list(input?)List documents in the project.
pb.documents.get(id)Read a single document.
pb.documents.extract(id, input?)Run extraction over the document through a shape.

Search documents

const hits = await pb.documents.search({ query: "renewal terms" });
Over REST these are the /v1/documents endpoints: upload corresponds to the signed-url + register flow, and search maps to POST /v1/documents/search.

Sources

The provenance view: read entities scoped to the source they came from.

Capture and extract

Run a document through a shape.