Installing Web SDK

This guide explains how to run and explore the Perso Interactive Web SDK demo applications.

What's Included in This Demo

The repository contains four demo applications, all powered by the Perso Interactive Web SDK:

DemoDescription
SvelteKitFull-featured example with server-side session creation
Next.jsReact integration with server-side session creation via App Router
Vanilla JavaScriptMinimal HTML/JS integration using Vite
TypeScriptSame as Vanilla, but with full type safety

All demos connect to the same Perso Interactive API and follow the same session lifecycle.

📋 Prerequisites

Any demo requires API Key, create an account and API key using the Perso AI Platform.

Create your account and API Key →

Before running any demo, make sure you have:

  • Node.js v20 or higher
  • pnpm package manager (npm / yarn also work when installing the SDK as a dependency)

Install Node.js

nvm install 20

Install pnpm

npm install -g pnpm
🌐

Browser requirements for WebRTC

The avatar session relies on RTCPeerConnection and MediaStream APIs. These only run in a secure context, meaning the page hosting the SDK must be served over:

  • http://localhost / http://127.0.0.1 (development), or
  • https://… (production).

When enableVoiceChat / startProcessSTT is used, the browser prompts for microphone permission. Camera permission is not required because the avatar video is streamed from the server — the <video> element only receives the remote track.

For headless testing (e.g. Playwright), launch Chromium with --use-fake-ui-for-media-stream --use-fake-device-for-media-stream and grant ["microphone", "camera"] permissions so the session can negotiate without user interaction.

🚀 Installing the SDK

Install the SDK into your own project using your preferred package manager:

npm install perso-interactive-sdk-web
yarn add perso-interactive-sdk-web
pnpm add perso-interactive-sdk-web

Usage

Client-side (browser):

import {
  createSession,
  ChatTool,
  ChatState,
} from "perso-interactive-sdk-web/client";

ChatTool is used with Tool Calling. ChatState values are documented in ChatState values.

Server-side (Node.js/SvelteKit/Next.js):

import { createSessionId, getIntroMessage, PersoUtilServer } from "perso-interactive-sdk-web/server";

TypeScript: The SDK ships with full type definitions via exports.types in its package.json. No manual setup is needed in a modern TypeScript project (moduleResolution: "bundler" or "node16") — just import from the subpath and types resolve automatically.


Running the demo repo (optional)

If you cloned the SDK repository itself to explore the demos, install its dependencies and build the SDK from the repo root:

pnpm install
pnpm build:sdk

You do not need these commands when using the SDK as a dependency in your own project.


Minimum runnable example (from scratch)

If you want to integrate the SDK into a new project without using any of the framework demos, the shortest working setup is Vite + a small Node server. Vite serves the browser bundle; Node handles session issuance so the API key stays off the client.

1. Create the project

mkdir perso-quickstart && cd perso-quickstart
pnpm init
pnpm add perso-interactive-sdk-web
pnpm add -D vite

2. package.json

{
  "name": "perso-quickstart",
  "type": "module",
  "scripts": {
    "server": "node server.mjs",
    "dev": "vite"
  },
  "dependencies": {
    "perso-interactive-sdk-web": "^1.3.4"
  },
  "devDependencies": {
    "vite": "^5.0.0"
  }
}

3. server.mjs — issues a new session ID on each GET /api/session

import http from "node:http";
import {
  PersoUtilServer,
  createSessionId,
  getIntroMessage,
} from "perso-interactive-sdk-web/server";

const apiServerUrl = "https://platform.perso.ai";
const apiKey = process.env.PERSO_INTERACTIVE_API_KEY;
if (!apiKey) throw new Error("Set PERSO_INTERACTIVE_API_KEY");

const server = http.createServer(async (req, res) => {
  // Allow the Vite dev server (http://localhost:5173) to fetch this endpoint.
  res.setHeader("Access-Control-Allow-Origin", "*");

  if (req.method === "GET" && req.url === "/api/session") {
    try {
      const llms    = await PersoUtilServer.getLLMs(apiServerUrl, apiKey);
      const ttss    = await PersoUtilServer.getTTSs(apiServerUrl, apiKey);
      const stts    = await PersoUtilServer.getSTTs(apiServerUrl, apiKey);
      const styles  = await PersoUtilServer.getModelStyles(apiServerUrl, apiKey);
      const prompts = await PersoUtilServer.getPrompts(apiServerUrl, apiKey);

      const selectedStyle  = styles.find((s) => s.platform_type === "webrtc");
      const selectedPrompt = prompts[0];

      const sessionId = await createSessionId(apiServerUrl, apiKey, {
        using_stf_webrtc: true,
        model_style: selectedStyle.name,
        prompt: selectedPrompt.prompt_id,
        llm_type: llms[0].name,
        tts_type: ttss[0].name,
        stt_type: stts[0].name,
      });

      const introMessage = await getIntroMessage(
        apiServerUrl,
        apiKey,
        selectedPrompt.prompt_id,
      );

      res.setHeader("Content-Type", "application/json");
      res.end(JSON.stringify({ apiServerUrl, sessionId, introMessage }));
    } catch (e) {
      res.statusCode = 500;
      res.end(JSON.stringify({ error: String(e) }));
    }
    return;
  }

  res.statusCode = 404;
  res.end();
});

server.listen(3000, () => {
  console.log("Session endpoint: http://localhost:3000/api/session");
});

4. index.html (served by Vite)

<!doctype html>
<html>
  <body>
    <video id="avatar" autoplay playsinline width="480" height="640"></video>

    <p><span id="status">Ready</span></p>
    <ul id="log"></ul>
    <input id="msg" placeholder="Type a message..." />
    <button id="send">Send</button>
    <button id="mic">Speak</button>

    <script type="module" src="/main.js"></script>
  </body>
</html>

5. main.js — fetches a sessionId, creates the session, wires up UI

import { createSession, ChatState } from "perso-interactive-sdk-web/client";

// Fetch a fresh sessionId from our own server (keeps API key off the client).
const { apiServerUrl, sessionId } = await fetch(
  "http://localhost:3000/api/session",
).then((r) => r.json());

// 5-arg modern signature: (apiServer, sessionId, width, height, clientTools)
const session = await createSession(apiServerUrl, sessionId, 480, 640, []);
session.setSrc(document.getElementById("avatar"));

// Chat transcript — newest at [0]
session.subscribeChatLog((log) => {
  document.getElementById("log").innerHTML = log
    .slice()
    .reverse()
    .map(
      (c) => `<li><b>${c.isUser ? "You" : "Avatar"}:</b> ${c.text}</li>`,
    )
    .join("");
});

// Pipeline state — empty Set means fully idle / ready for input
session.subscribeChatStates((states) => {
  const el = document.getElementById("status");
  if (states.size === 0)                      el.textContent = "Ready";
  else if (states.has(ChatState.SPEAKING))    el.textContent = "Avatar is speaking…";
  else if (states.has(ChatState.RECORDING))   el.textContent = "Listening…";
  else                                        el.textContent = "Thinking…";
});

// Send a text message
document.getElementById("send").addEventListener("click", async () => {
  const input = document.getElementById("msg");
  const text = input.value.trim();
  if (!text) return;
  input.value = "";
  await session.processChat(text);
});

// Toggle microphone-driven voice input
let recording = false;
document.getElementById("mic").addEventListener("click", async () => {
  if (!recording) {
    await session.startProcessSTT();
    recording = true;
    document.getElementById("mic").textContent = "Stop";
  } else {
    await session.stopProcessSTT();
    recording = false;
    document.getElementById("mic").textContent = "Speak";
  }
});

6. Run

# Terminal A — session server
PERSO_INTERACTIVE_API_KEY=pak-xxxxxxxx pnpm run server

# Terminal B — Vite dev server
pnpm run dev

Open the Vite URL (default http://localhost:5173) and you have a working avatar with text + voice chat. Replace pieces (e.g. swap Node for Next.js API routes, swap Vite for Webpack) as your stack requires — the public contract is just GET /api/session returning { apiServerUrl, sessionId }.


SvelteKit Demo

apps/svelte (@perso-interactive-sdk-web/app-svelte)

The SvelteKit demo demonstrates server-side session creation and is recommended if you need:

  • Secure API key handling
  • Session configuration
  • SSR-compatible architecture

Configuration

Before running the demo, configure your API key.

Create apps/svelte/.env and set environment variable:

PERSO_INTERACTIVE_API_KEY=YOUR_API_KEY

Or a constant file

// src/lib/constant.ts
export const persoInteractiveApiKey = "YOUR_API_KEY";

Session creation is handled on the server in:

src/routes/session/+server.ts

This endpoint:

  1. Fetches available models and settings
  2. Creates a Perso session ID
  3. Returns the session ID to the client

Run it with:

pnpm svelte

The app starts on http://localhost:5173. When the page loads, a session is automatically created via the /session route and the avatar connects over WebRTC.


Next.JS Demo

apps/nextjs (@perso-interactive-sdk-web/app-nextjs)

The Next.js demo is a React-based example with server-side session creation via the App Router.

Use this demo if you want:

  • A React integration with Next.js App Router
  • Server-side API key protection
  • A production-like reference for SDK usage

Configuration

Create a .env.local file in apps/nextjs/:

// .env.local
PERSO_INTERACTIVE_API_KEY = "YOUR API KEY";

Run it with:

pnpm nextjs

The app starts on http://localhost:5174. When the page loads, a session is automatically created via the /api/session route and the avatar connects over WebRTC.


Vanilla JavaScript Demo

apps/vanilla (@perso-interactive-sdk-web/app-vanilla)

The Vanilla demo is a minimal HTML + JavaScript example powered by Vite.

Use this demo if you want:

  • The simplest possible integration
  • No framework dependencies
  • A quick reference for SDK usage

Run it with:

pnpm vanilla

The app starts on http://localhost:5173. When the page loads, enter your API server URL and API key, configure session settings, and press START to connect the avatar over WebRTC.

If port 5173 is already in use (for example, if you are running the SvelteKit or TypeScript demo concurrently), Vite will automatically pick the next available port — check the terminal output for the actual URL.


TypeScript Demo

apps/typescript (@perso-interactive-sdk-web/app-typescript)

The TypeScript demo is identical to the Vanilla demo in behavior and UI, but adds:

  • Full SDK typings
  • Compile-time safety
  • Better IDE support

Run it with:

pnpm typescript

The app starts on http://localhost:5173. When the page loads, enter your API server URL and API key, configure session settings, and press START to connect the avatar over WebRTC.

If port 5173 is already in use (for example, if you are running another demo concurrently), Vite will automatically pick the next available port — check the terminal output for the actual URL.


SDK Reference

Available SDK Utilities

ℹ️

Client vs Server module

The SDK exposes different helpers depending on which subpath you import from.

  • perso-interactive-sdk-web/client — use in the browser. Exports top-level helpers like getAllSettings, getLLMs, getTTSs, getSTTs, getModelStyles, getPrompts, getDocuments, getBackgroundImages, getMcpServers, getTextNormalizations, getSessionInfo, plus createSession, createSessionId, ChatTool, ChatState, etc.
  • perso-interactive-sdk-web/server — use in Node/server environments. Exports createSessionId, getIntroMessage, getSessionTemplates, getSessionTemplate, ApiError, and the PersoUtilServer class (static methods for fetching options). There is no getAllSettings on the server subpath — use PersoUtilServer static methods instead.

Fetching configuration (browser)

You can retrieve configuration options in two ways from perso-interactive-sdk-web/client.

1. All-in-one (getAllSettings)

import { getAllSettings } from "perso-interactive-sdk-web/client";

const settings = await getAllSettings(apiServerUrl, apiKey);

// Response shape (9 keys):
// {
//   llms,               // list from getLLMs()
//   modelStyles,        // list from getModelStyles()
//   ttsTypes,           // list from getTTSs()
//   sttTypes,           // list from getSTTs()
//   prompts,            // list from getPrompts()
//   documents,          // list from getDocuments()
//   backgroundImages,   // list from getBackgroundImages()
//   mcpServers,         // list from getMcpServers()
//   textNormalizations, // list from getTextNormalizations()
// }

Note: getAllSettings returns ttsTypes / sttTypes (not ttss / stts). The key names differ from the individual getter function names.

2. Individual getters

Call only the helpers you need:

  • getLLMs
  • getModelStyles
  • getBackgroundImages
  • getTTSs
  • getSTTs
  • getPrompts
  • getDocuments
  • getMcpServers
  • getTextNormalizations
  • getSessionInfo

All helpers accept:

(apiServerUrl: string, apiKey: string)

and return typed JSON responses.

Fetching configuration (server)

On the server, use the PersoUtilServer class — its methods are static and mirror the client-side helpers:

import { PersoUtilServer } from "perso-interactive-sdk-web/server";

const llms    = await PersoUtilServer.getLLMs(apiServerUrl, apiKey);
const styles  = await PersoUtilServer.getModelStyles(apiServerUrl, apiKey);
const ttss    = await PersoUtilServer.getTTSs(apiServerUrl, apiKey);
const stts    = await PersoUtilServer.getSTTs(apiServerUrl, apiKey);
const prompts = await PersoUtilServer.getPrompts(apiServerUrl, apiKey);
// Also available: getDocuments, getBackgroundImages, getMcpServers,
// getTextNormalizations, getSessionTemplates, getSessionTemplate, getSessionInfo

Use the server approach to keep your API key out of the browser.

Error Handling

The SDK provides an ApiError class for HTTP failures.

It includes:

status

code

detail

attr

Use this to map API errors to user-facing messages or retry logic.

Session Flow Overview


  1. Collect the Perso Interactive API server URL and API key from the operator.
  2. Fetch configuration options for the UI. Use either approach below depending on your needs:
    • Option A (recommended, simpler): getAllSettings(apiServerUrl, apiKey) imported from perso-interactive-sdk-web/client — returns every config (LLM, TTS, STT, model style, prompt, document, background, MCP servers, text normalizations) in one call. Ideal for initial UI load. On the server, use PersoUtilServer.getLLMs() etc. individually.
    • Option B (fine-grained): Call only the individual getters you need — getLLMs(), getTTSs(), getSTTs(), getModelStyles(), getPrompts(), getDocuments(), getBackgroundImages(), getMcpServers(), getTextNormalizations(). Use this when you want to configure only a subset, or refresh specific options selectively.
  3. When the user clicks START, invoke createSessionId on the server to obtain a fresh sessionId, deliver it to the browser (for example, via a GET /api/session endpoint your app exposes), then call createSession in the browser to obtain a Session object, and finally session.setSrc(videoElement) to bind the WebRTC media stream to a <video> element.
  4. Drive the conversation and UI via Session methods:
    • session.processChat(message) — send a user message; the SDK runs the LLM, TTS, and avatar animation.
    • session.processTTSTF(message) — make the avatar speak an exact string, bypassing the LLM.
    • session.subscribeChatLog((log) => ...) — render the chat transcript (newest entry at index [0]).
    • session.subscribeChatStates((states) => ...) — react to state changes. An empty Set means the pipeline is fully idle; otherwise use the specific members (Speaking, Recording, Analyzing, etc.) to show busy indicators or disable UI.
    • session.startProcessSTT() / session.stopProcessSTT() — enable microphone-driven voice input.
    • Provide ChatTool instances (see Tool Calling) for app-specific actions, and handle SDK errors via the provided callbacks.

Quick Look

Server Side

1. Create Session ID

⚠️

Required fields when using_stf_webrtc: true

When creating a session with using_stf_webrtc: true, the server requires llm_type, tts_type, stt_type, and a prompt (prompt ID). Omitting any of them returns a 400 response such as Prompt or Agent is required for Capability LLM.

Use the values returned from PersoUtilServer (server) or getAllSettings (client) to populate these fields — don't hardcode them, as the available set varies per account.

ℹ️

Field value shapes (server-side)

The option getters return objects — the session params expect specific fields from those objects:

  • model_style — the model style's name (e.g. "indian_m_6_rajesh-front-ivory_shirt-earnest"). For WebRTC sessions, filter to styles where platform_type === "webrtc".
  • prompt — the prompt's prompt_id (e.g. "plp-ce0cd928..."). Not the prompt's name or id.
  • llm_type, tts_type, stt_type — the corresponding object's name field.
♻️

sessionId is single-use

Each sessionId returned by createSessionId is consumed by the first successful createSession call that uses it. Re-using a sessionId — for example, after a page reload, a failed negotiation, or an app that re-connects — returns 400: ICE server data is only available in created status.

Generate a new sessionId on the server every time the browser starts a fresh createSession. The Minimum runnable example above does this by re-fetching GET /api/session on each page load.

// Import from server subpath
import { createSessionId, getIntroMessage, PersoUtilServer } from "perso-interactive-sdk-web/server";

// 1. Initialize SDK
const apiServerUrl = "https://platform.perso.ai";
const apiKey = "YOUR API KEY";

// 2. Fetch available options (server-side: PersoUtilServer static methods)
const llms    = await PersoUtilServer.getLLMs(apiServerUrl, apiKey);
const ttss    = await PersoUtilServer.getTTSs(apiServerUrl, apiKey);
const stts    = await PersoUtilServer.getSTTs(apiServerUrl, apiKey);
const styles  = await PersoUtilServer.getModelStyles(apiServerUrl, apiKey);
const prompts = await PersoUtilServer.getPrompts(apiServerUrl, apiKey);

// 3. Pick WebRTC-compatible values
const selectedStyle  = styles.find(s => s.platform_type === "webrtc");
const selectedPrompt = prompts[0];

// 4. Create session id with configuration
const sessionId = await createSessionId(apiServerUrl, apiKey, {
  using_stf_webrtc: true,
  model_style: selectedStyle.name,
  prompt: selectedPrompt.prompt_id,   // use prompt_id, not id/name
  llm_type: llms[0].name,
  tts_type: ttss[0].name,
  stt_type: stts[0].name,
  // Optional:
  // document, background_image, mcp_servers,
  // padding_left, padding_top, padding_height,
});

// 5. Get intro message (optional) — used for your own UI, not passed to createSession
const introMessage = await getIntroMessage(apiServerUrl, apiKey, selectedPrompt.prompt_id);

// Deliver { apiServerUrl, sessionId, introMessage } to the browser
// — e.g. as the JSON response of your `GET /api/session` endpoint —
// and call createSession() from there. Do NOT expose apiKey to the browser.
return { sessionId, introMessage };

2. Create Session WebRTC(Browser)

// Import from client subpath
import { createSession } from "perso-interactive-sdk-web/client";

// Create WebRTC session (modern 5-arg signature)
const session = await createSession(
  apiServerUrl,
  sessionId,
  chatbotWidth,     // pixels (e.g. 1080)
  chatbotHeight,    // pixels (e.g. 1920)
  clientTools ?? [], // see Tool Calling Example
);

The introMessage returned by getIntroMessage() is not passed to createSession. Use it in your own UI (for example, render it as the avatar's first message in the transcript).


Mounting the avatar to a video element

createSession does not attach to the DOM on its own. It returns a session object; you then bind that session to a <video> element by calling session.setSrc(videoElement).

// 1. Place a <video> element in your page
//    autoplay and playsinline are recommended for WebRTC media
// <video id="avatar" autoplay playsinline></video>

// 2. Create the session (DOM-independent, modern 5-arg signature)
const session = await createSession(
  apiServerUrl,
  sessionId,
  chatbotWidth,     // width in pixels (e.g. 1080)
  chatbotHeight,    // height in pixels (e.g. 1920)
  clientTools ?? [],
);

// 3. Bind the session's media stream to your <video>
const videoElement = document.getElementById("avatar") as HTMLVideoElement;
session.setSrc(videoElement);

Notes:

  • chatbotWidth and chatbotHeight are in pixels.
  • Audio is delivered on the video element's audio track (WebRTC MediaStream). No separate <audio> element is required for voice chat.
  • For complete integration patterns (error handling, chat-state subscription, UI components), see the SvelteKit, Next.js, Vanilla, or TypeScript demos in the SDK repository.

Sending messages and receiving replies

Once the avatar video is mounted, drive the conversation through methods on the Session object returned by createSession. The SDK handles the LLM call, TTS synthesis, and avatar animation — your app only needs to send text and render the updated chat log.

MethodPurpose
session.processChat(message)Send a user message. Runs through LLM → TTS → avatar — normal conversation.
session.processTTSTF(message)Make the avatar speak an exact string, bypassing the LLM (scripted greetings, system notices, etc.).
session.subscribeChatLog((log) => ...)Receive the full chat transcript on every update. The newest entry is at log[0].
session.subscribeChatStates((states) => ...)Receive a Set<ChatState> on every pipeline state change. Check states.size === 0 first for idle/ready — some stages can linger in the set briefly, so matching specific members first leaves the UI stuck on "Thinking…" after a reply finishes.

For a complete, runnable wiring of these APIs (HTML + event handlers + state UI), see the main.js in Minimum runnable example above.


Voice chat (optional)

To let the user talk to the avatar via microphone instead of typing, use the SDK's built-in STT pipeline. Transcribed speech is automatically routed through processChat, so the reply flows through the same subscribeChatLog callback as text input.

MethodPurpose
session.startProcessSTT()Start capturing microphone input and transcribe it. Triggers the browser mic permission prompt on first use.
session.stopProcessSTT()Stop capture.

Use ChatState.RECORDING in subscribeChatStates to render a live "listening" indicator. A mic-toggle button wired to these two calls is shown in main.js of Minimum runnable example.


ChatState values

ChatState is an enum exported from perso-interactive-sdk-web/client. subscribeChatStates hands you a Set<ChatState> (multiple states can be active at once) so your UI can reflect exactly what the pipeline is doing:

ValueMeaning
(empty Set, states.size === 0)Idle / ready — the pipeline has finished all work. Accept new input.
ChatState.IDLEExplicit idle marker (some SDK versions expose this alongside the empty set).
ChatState.RECORDINGMicrophone is capturing user speech (startProcessSTT is active).
ChatState.ANALYZINGLLM is generating the response.
ChatState.SPEAKINGTTS audio is playing and the avatar is animating.
ChatState.TTSTTS synthesis in progress (often overlaps with SPEAKING).
ChatState.LLMLLM streaming in progress.

Important: always check states.size === 0 before the individual flags. Some stages (e.g. ANALYZING) can remain in the set briefly before the pipeline fully drains, so matching specific members first will leave your UI stuck on "Thinking…" after a reply has already finished.

Use these to enable/disable the Send button, show a spinner, or gate further user input.


Client Side

⚠️

Warning: Using createSessionId on the client side is not recommended. This exposes your API KEY in the browser, making it vulnerable to theft. If your API KEY is compromised due to client-side implementation, the SDK provider assumes no responsibility. For security, please use server-side session creation instead.

1. Create Session ID + Create Session WebRTC

// Import from client subpath
import {
  getAllSettings,
  createSessionId,
  createSession,
  ChatTool,
  ChatState,
} from "perso-interactive-sdk-web/client";

// 1. Initialize SDK
const apiServerUrl = "https://platform.perso.ai";
const apiKey = "YOUR API KEY";

// 2. Fetch available options (client-side: top-level helpers)
const settings = await getAllSettings(apiServerUrl, apiKey);
const selectedStyle  = settings.modelStyles.find(s => s.platform_type === "webrtc");
const selectedPrompt = settings.prompts[0];

// 3. Create session id
const sessionId = await createSessionId(apiServerUrl, apiKey, {
  using_stf_webrtc: true,
  model_style: selectedStyle.name,
  prompt: selectedPrompt.prompt_id,
  llm_type: settings.llms[0].name,
  tts_type: settings.ttsTypes[0].name,
  stt_type: settings.sttTypes[0].name,
});

// 4. Create WebRTC Session (modern 5-arg signature)
const session = await createSession(
  apiServerUrl,
  sessionId,
  chatbotWidth,     // pixels (e.g. 1080)
  chatbotHeight,    // pixels (e.g. 1920)
  clientTools ?? [],
);

// 5. Mount + converse using session.setSrc / processChat / subscribeChatLog
//    exactly as shown in the Server Side "Sending messages and receiving replies" section.

Tool Calling Example

Client-side tool calling allows the model to trigger application-specific actions.

A reference implementation can be found here:

🔗 Perso Interactive Web SDK Tool Calling→

Web SDK API Reference

For detailed API documentation, see this repository:

🔗 Perso Interactive Web SDK API Reference→

What’s Next

Learn about Perso Interactive On-Device SDK in the next section.