Installing Web SDK
This guide explains how to run and explore the Perso Interactive Web SDK demo applications.
What's Included in This Demo
The repository contains four demo applications, all powered by the Perso Interactive Web SDK:
| Demo | Description |
|---|---|
| SvelteKit | Full-featured example with server-side session creation |
| Next.js | React integration with server-side session creation via App Router |
| Vanilla JavaScript | Minimal HTML/JS integration using Vite |
| TypeScript | Same as Vanilla, but with full type safety |
All demos connect to the same Perso Interactive API and follow the same session lifecycle.
📋 Prerequisites
Any demo requires API Key, create an account and API key using the Perso AI Platform.
Create your account and API Key →Before running any demo, make sure you have:
- Node.js v20 or higher
- pnpm package manager (npm / yarn also work when installing the SDK as a dependency)
Install Node.js
nvm install 20Install pnpm
npm install -g pnpmBrowser requirements for WebRTC
The avatar session relies on
RTCPeerConnectionandMediaStreamAPIs. These only run in a secure context, meaning the page hosting the SDK must be served over:
http://localhost/http://127.0.0.1(development), orhttps://…(production).When
enableVoiceChat/startProcessSTTis used, the browser prompts for microphone permission. Camera permission is not required because the avatar video is streamed from the server — the<video>element only receives the remote track.For headless testing (e.g. Playwright), launch Chromium with
--use-fake-ui-for-media-stream --use-fake-device-for-media-streamand grant["microphone", "camera"]permissions so the session can negotiate without user interaction.
🚀 Installing the SDK
Install the SDK into your own project using your preferred package manager:
npm install perso-interactive-sdk-webyarn add perso-interactive-sdk-webpnpm add perso-interactive-sdk-webUsage
Client-side (browser):
import {
createSession,
ChatTool,
ChatState,
} from "perso-interactive-sdk-web/client";ChatTool is used with Tool Calling. ChatState values are documented in ChatState values.
Server-side (Node.js/SvelteKit/Next.js):
import { createSessionId, getIntroMessage, PersoUtilServer } from "perso-interactive-sdk-web/server";TypeScript: The SDK ships with full type definitions via
exports.typesin itspackage.json. No manual setup is needed in a modern TypeScript project (moduleResolution: "bundler"or"node16") — just import from the subpath and types resolve automatically.
Running the demo repo (optional)
If you cloned the SDK repository itself to explore the demos, install its dependencies and build the SDK from the repo root:
pnpm install
pnpm build:sdkYou do not need these commands when using the SDK as a dependency in your own project.
Minimum runnable example (from scratch)
If you want to integrate the SDK into a new project without using any of the framework demos, the shortest working setup is Vite + a small Node server. Vite serves the browser bundle; Node handles session issuance so the API key stays off the client.
1. Create the project
mkdir perso-quickstart && cd perso-quickstart
pnpm init
pnpm add perso-interactive-sdk-web
pnpm add -D vite2. package.json
package.json{
"name": "perso-quickstart",
"type": "module",
"scripts": {
"server": "node server.mjs",
"dev": "vite"
},
"dependencies": {
"perso-interactive-sdk-web": "^1.3.4"
},
"devDependencies": {
"vite": "^5.0.0"
}
}3. server.mjs — issues a new session ID on each GET /api/session
server.mjs — issues a new session ID on each GET /api/sessionimport http from "node:http";
import {
PersoUtilServer,
createSessionId,
getIntroMessage,
} from "perso-interactive-sdk-web/server";
const apiServerUrl = "https://platform.perso.ai";
const apiKey = process.env.PERSO_INTERACTIVE_API_KEY;
if (!apiKey) throw new Error("Set PERSO_INTERACTIVE_API_KEY");
const server = http.createServer(async (req, res) => {
// Allow the Vite dev server (http://localhost:5173) to fetch this endpoint.
res.setHeader("Access-Control-Allow-Origin", "*");
if (req.method === "GET" && req.url === "/api/session") {
try {
const llms = await PersoUtilServer.getLLMs(apiServerUrl, apiKey);
const ttss = await PersoUtilServer.getTTSs(apiServerUrl, apiKey);
const stts = await PersoUtilServer.getSTTs(apiServerUrl, apiKey);
const styles = await PersoUtilServer.getModelStyles(apiServerUrl, apiKey);
const prompts = await PersoUtilServer.getPrompts(apiServerUrl, apiKey);
const selectedStyle = styles.find((s) => s.platform_type === "webrtc");
const selectedPrompt = prompts[0];
const sessionId = await createSessionId(apiServerUrl, apiKey, {
using_stf_webrtc: true,
model_style: selectedStyle.name,
prompt: selectedPrompt.prompt_id,
llm_type: llms[0].name,
tts_type: ttss[0].name,
stt_type: stts[0].name,
});
const introMessage = await getIntroMessage(
apiServerUrl,
apiKey,
selectedPrompt.prompt_id,
);
res.setHeader("Content-Type", "application/json");
res.end(JSON.stringify({ apiServerUrl, sessionId, introMessage }));
} catch (e) {
res.statusCode = 500;
res.end(JSON.stringify({ error: String(e) }));
}
return;
}
res.statusCode = 404;
res.end();
});
server.listen(3000, () => {
console.log("Session endpoint: http://localhost:3000/api/session");
});4. index.html (served by Vite)
index.html (served by Vite)<!doctype html>
<html>
<body>
<video id="avatar" autoplay playsinline width="480" height="640"></video>
<p><span id="status">Ready</span></p>
<ul id="log"></ul>
<input id="msg" placeholder="Type a message..." />
<button id="send">Send</button>
<button id="mic">Speak</button>
<script type="module" src="/main.js"></script>
</body>
</html>5. main.js — fetches a sessionId, creates the session, wires up UI
main.js — fetches a sessionId, creates the session, wires up UIimport { createSession, ChatState } from "perso-interactive-sdk-web/client";
// Fetch a fresh sessionId from our own server (keeps API key off the client).
const { apiServerUrl, sessionId } = await fetch(
"http://localhost:3000/api/session",
).then((r) => r.json());
// 5-arg modern signature: (apiServer, sessionId, width, height, clientTools)
const session = await createSession(apiServerUrl, sessionId, 480, 640, []);
session.setSrc(document.getElementById("avatar"));
// Chat transcript — newest at [0]
session.subscribeChatLog((log) => {
document.getElementById("log").innerHTML = log
.slice()
.reverse()
.map(
(c) => `<li><b>${c.isUser ? "You" : "Avatar"}:</b> ${c.text}</li>`,
)
.join("");
});
// Pipeline state — empty Set means fully idle / ready for input
session.subscribeChatStates((states) => {
const el = document.getElementById("status");
if (states.size === 0) el.textContent = "Ready";
else if (states.has(ChatState.SPEAKING)) el.textContent = "Avatar is speaking…";
else if (states.has(ChatState.RECORDING)) el.textContent = "Listening…";
else el.textContent = "Thinking…";
});
// Send a text message
document.getElementById("send").addEventListener("click", async () => {
const input = document.getElementById("msg");
const text = input.value.trim();
if (!text) return;
input.value = "";
await session.processChat(text);
});
// Toggle microphone-driven voice input
let recording = false;
document.getElementById("mic").addEventListener("click", async () => {
if (!recording) {
await session.startProcessSTT();
recording = true;
document.getElementById("mic").textContent = "Stop";
} else {
await session.stopProcessSTT();
recording = false;
document.getElementById("mic").textContent = "Speak";
}
});6. Run
# Terminal A — session server
PERSO_INTERACTIVE_API_KEY=pak-xxxxxxxx pnpm run server
# Terminal B — Vite dev server
pnpm run devOpen the Vite URL (default http://localhost:5173) and you have a working avatar with text + voice chat. Replace pieces (e.g. swap Node for Next.js API routes, swap Vite for Webpack) as your stack requires — the public contract is just GET /api/session returning { apiServerUrl, sessionId }.
SvelteKit Demo
apps/svelte (@perso-interactive-sdk-web/app-svelte)
The SvelteKit demo demonstrates server-side session creation and is recommended if you need:
- Secure API key handling
- Session configuration
- SSR-compatible architecture
Configuration
Before running the demo, configure your API key.
Create apps/svelte/.env and set environment variable:
PERSO_INTERACTIVE_API_KEY=YOUR_API_KEYOr a constant file
// src/lib/constant.ts
export const persoInteractiveApiKey = "YOUR_API_KEY";Session creation is handled on the server in:
src/routes/session/+server.tsThis endpoint:
- Fetches available models and settings
- Creates a Perso session ID
- Returns the session ID to the client
Run it with:
pnpm svelteThe app starts on http://localhost:5173. When the page loads, a session is automatically created via the /session route and the avatar connects over WebRTC.
Next.JS Demo
apps/nextjs (@perso-interactive-sdk-web/app-nextjs)
The Next.js demo is a React-based example with server-side session creation via the App Router.
Use this demo if you want:
- A React integration with Next.js App Router
- Server-side API key protection
- A production-like reference for SDK usage
Configuration
Create a .env.local file in apps/nextjs/:
// .env.local
PERSO_INTERACTIVE_API_KEY = "YOUR API KEY";Run it with:
pnpm nextjsThe app starts on http://localhost:5174. When the page loads, a session is automatically created via the /api/session route and the avatar connects over WebRTC.
Vanilla JavaScript Demo
apps/vanilla (@perso-interactive-sdk-web/app-vanilla)
The Vanilla demo is a minimal HTML + JavaScript example powered by Vite.
Use this demo if you want:
- The simplest possible integration
- No framework dependencies
- A quick reference for SDK usage
Run it with:
pnpm vanillaThe app starts on http://localhost:5173. When the page loads, enter your API server URL and API key, configure session settings, and press START to connect the avatar over WebRTC.
If port 5173 is already in use (for example, if you are running the SvelteKit or TypeScript demo concurrently), Vite will automatically pick the next available port — check the terminal output for the actual URL.
TypeScript Demo
apps/typescript (@perso-interactive-sdk-web/app-typescript)
The TypeScript demo is identical to the Vanilla demo in behavior and UI, but adds:
- Full SDK typings
- Compile-time safety
- Better IDE support
Run it with:
pnpm typescriptThe app starts on http://localhost:5173. When the page loads, enter your API server URL and API key, configure session settings, and press START to connect the avatar over WebRTC.
If port 5173 is already in use (for example, if you are running another demo concurrently), Vite will automatically pick the next available port — check the terminal output for the actual URL.
SDK Reference
Available SDK Utilities
Client vs Server module
The SDK exposes different helpers depending on which subpath you import from.
perso-interactive-sdk-web/client— use in the browser. Exports top-level helpers likegetAllSettings,getLLMs,getTTSs,getSTTs,getModelStyles,getPrompts,getDocuments,getBackgroundImages,getMcpServers,getTextNormalizations,getSessionInfo, pluscreateSession,createSessionId,ChatTool,ChatState, etc.perso-interactive-sdk-web/server— use in Node/server environments. ExportscreateSessionId,getIntroMessage,getSessionTemplates,getSessionTemplate,ApiError, and thePersoUtilServerclass (static methods for fetching options). There is nogetAllSettingson the server subpath — usePersoUtilServerstatic methods instead.
Fetching configuration (browser)
You can retrieve configuration options in two ways from perso-interactive-sdk-web/client.
1. All-in-one (getAllSettings)
import { getAllSettings } from "perso-interactive-sdk-web/client";
const settings = await getAllSettings(apiServerUrl, apiKey);
// Response shape (9 keys):
// {
// llms, // list from getLLMs()
// modelStyles, // list from getModelStyles()
// ttsTypes, // list from getTTSs()
// sttTypes, // list from getSTTs()
// prompts, // list from getPrompts()
// documents, // list from getDocuments()
// backgroundImages, // list from getBackgroundImages()
// mcpServers, // list from getMcpServers()
// textNormalizations, // list from getTextNormalizations()
// }Note:
getAllSettingsreturnsttsTypes/sttTypes(notttss/stts). The key names differ from the individual getter function names.
2. Individual getters
Call only the helpers you need:
getLLMsgetModelStylesgetBackgroundImagesgetTTSsgetSTTsgetPromptsgetDocumentsgetMcpServersgetTextNormalizationsgetSessionInfo
All helpers accept:
(apiServerUrl: string, apiKey: string)and return typed JSON responses.
Fetching configuration (server)
On the server, use the PersoUtilServer class — its methods are static and mirror the client-side helpers:
import { PersoUtilServer } from "perso-interactive-sdk-web/server";
const llms = await PersoUtilServer.getLLMs(apiServerUrl, apiKey);
const styles = await PersoUtilServer.getModelStyles(apiServerUrl, apiKey);
const ttss = await PersoUtilServer.getTTSs(apiServerUrl, apiKey);
const stts = await PersoUtilServer.getSTTs(apiServerUrl, apiKey);
const prompts = await PersoUtilServer.getPrompts(apiServerUrl, apiKey);
// Also available: getDocuments, getBackgroundImages, getMcpServers,
// getTextNormalizations, getSessionTemplates, getSessionTemplate, getSessionInfoUse the server approach to keep your API key out of the browser.
Error Handling
The SDK provides an ApiError class for HTTP failures.
It includes:
status
code
detail
attr
Use this to map API errors to user-facing messages or retry logic.
Session Flow Overview
- Collect the Perso Interactive API server URL and API key from the operator.
- Fetch configuration options for the UI. Use either approach below depending on your needs:
- Option A (recommended, simpler):
getAllSettings(apiServerUrl, apiKey)imported fromperso-interactive-sdk-web/client— returns every config (LLM, TTS, STT, model style, prompt, document, background, MCP servers, text normalizations) in one call. Ideal for initial UI load. On the server, usePersoUtilServer.getLLMs()etc. individually. - Option B (fine-grained): Call only the individual getters you need —
getLLMs(),getTTSs(),getSTTs(),getModelStyles(),getPrompts(),getDocuments(),getBackgroundImages(),getMcpServers(),getTextNormalizations(). Use this when you want to configure only a subset, or refresh specific options selectively.
- Option A (recommended, simpler):
- When the user clicks START, invoke
createSessionIdon the server to obtain a fresh sessionId, deliver it to the browser (for example, via aGET /api/sessionendpoint your app exposes), then callcreateSessionin the browser to obtain aSessionobject, and finallysession.setSrc(videoElement)to bind the WebRTC media stream to a<video>element. - Drive the conversation and UI via
Sessionmethods:session.processChat(message)— send a user message; the SDK runs the LLM, TTS, and avatar animation.session.processTTSTF(message)— make the avatar speak an exact string, bypassing the LLM.session.subscribeChatLog((log) => ...)— render the chat transcript (newest entry at index[0]).session.subscribeChatStates((states) => ...)— react to state changes. An emptySetmeans the pipeline is fully idle; otherwise use the specific members (Speaking,Recording,Analyzing, etc.) to show busy indicators or disable UI.session.startProcessSTT()/session.stopProcessSTT()— enable microphone-driven voice input.- Provide
ChatToolinstances (see Tool Calling) for app-specific actions, and handle SDK errors via the provided callbacks.
Quick Look
Server Side
1. Create Session ID
Required fields when
using_stf_webrtc: trueWhen creating a session with
using_stf_webrtc: true, the server requiresllm_type,tts_type,stt_type, and aprompt(prompt ID). Omitting any of them returns a400response such asPrompt or Agent is required for Capability LLM.Use the values returned from
PersoUtilServer(server) orgetAllSettings(client) to populate these fields — don't hardcode them, as the available set varies per account.
Field value shapes (server-side)
The option getters return objects — the session params expect specific fields from those objects:
model_style— the model style'sname(e.g."indian_m_6_rajesh-front-ivory_shirt-earnest"). For WebRTC sessions, filter to styles whereplatform_type === "webrtc".prompt— the prompt'sprompt_id(e.g."plp-ce0cd928..."). Not the prompt'snameorid.llm_type,tts_type,stt_type— the corresponding object'snamefield.
sessionId is single-use
Each sessionId returned by
createSessionIdis consumed by the first successfulcreateSessioncall that uses it. Re-using a sessionId — for example, after a page reload, a failed negotiation, or an app that re-connects — returns400: ICE server data is only available in created status.Generate a new sessionId on the server every time the browser starts a fresh
createSession. TheMinimum runnable exampleabove does this by re-fetchingGET /api/sessionon each page load.
// Import from server subpath
import { createSessionId, getIntroMessage, PersoUtilServer } from "perso-interactive-sdk-web/server";
// 1. Initialize SDK
const apiServerUrl = "https://platform.perso.ai";
const apiKey = "YOUR API KEY";
// 2. Fetch available options (server-side: PersoUtilServer static methods)
const llms = await PersoUtilServer.getLLMs(apiServerUrl, apiKey);
const ttss = await PersoUtilServer.getTTSs(apiServerUrl, apiKey);
const stts = await PersoUtilServer.getSTTs(apiServerUrl, apiKey);
const styles = await PersoUtilServer.getModelStyles(apiServerUrl, apiKey);
const prompts = await PersoUtilServer.getPrompts(apiServerUrl, apiKey);
// 3. Pick WebRTC-compatible values
const selectedStyle = styles.find(s => s.platform_type === "webrtc");
const selectedPrompt = prompts[0];
// 4. Create session id with configuration
const sessionId = await createSessionId(apiServerUrl, apiKey, {
using_stf_webrtc: true,
model_style: selectedStyle.name,
prompt: selectedPrompt.prompt_id, // use prompt_id, not id/name
llm_type: llms[0].name,
tts_type: ttss[0].name,
stt_type: stts[0].name,
// Optional:
// document, background_image, mcp_servers,
// padding_left, padding_top, padding_height,
});
// 5. Get intro message (optional) — used for your own UI, not passed to createSession
const introMessage = await getIntroMessage(apiServerUrl, apiKey, selectedPrompt.prompt_id);
// Deliver { apiServerUrl, sessionId, introMessage } to the browser
// — e.g. as the JSON response of your `GET /api/session` endpoint —
// and call createSession() from there. Do NOT expose apiKey to the browser.
return { sessionId, introMessage };2. Create Session WebRTC(Browser)
// Import from client subpath
import { createSession } from "perso-interactive-sdk-web/client";
// Create WebRTC session (modern 5-arg signature)
const session = await createSession(
apiServerUrl,
sessionId,
chatbotWidth, // pixels (e.g. 1080)
chatbotHeight, // pixels (e.g. 1920)
clientTools ?? [], // see Tool Calling Example
);The
introMessagereturned bygetIntroMessage()is not passed tocreateSession. Use it in your own UI (for example, render it as the avatar's first message in the transcript).
Mounting the avatar to a video element
createSession does not attach to the DOM on its own. It returns a session object; you then bind that session to a <video> element by calling session.setSrc(videoElement).
// 1. Place a <video> element in your page
// autoplay and playsinline are recommended for WebRTC media
// <video id="avatar" autoplay playsinline></video>
// 2. Create the session (DOM-independent, modern 5-arg signature)
const session = await createSession(
apiServerUrl,
sessionId,
chatbotWidth, // width in pixels (e.g. 1080)
chatbotHeight, // height in pixels (e.g. 1920)
clientTools ?? [],
);
// 3. Bind the session's media stream to your <video>
const videoElement = document.getElementById("avatar") as HTMLVideoElement;
session.setSrc(videoElement);Notes:
chatbotWidthandchatbotHeightare in pixels.- Audio is delivered on the video element's audio track (WebRTC
MediaStream). No separate<audio>element is required for voice chat. - For complete integration patterns (error handling, chat-state subscription, UI components), see the SvelteKit, Next.js, Vanilla, or TypeScript demos in the SDK repository.
Sending messages and receiving replies
Once the avatar video is mounted, drive the conversation through methods on the Session object returned by createSession. The SDK handles the LLM call, TTS synthesis, and avatar animation — your app only needs to send text and render the updated chat log.
| Method | Purpose |
|---|---|
session.processChat(message) | Send a user message. Runs through LLM → TTS → avatar — normal conversation. |
session.processTTSTF(message) | Make the avatar speak an exact string, bypassing the LLM (scripted greetings, system notices, etc.). |
session.subscribeChatLog((log) => ...) | Receive the full chat transcript on every update. The newest entry is at log[0]. |
session.subscribeChatStates((states) => ...) | Receive a Set<ChatState> on every pipeline state change. Check states.size === 0 first for idle/ready — some stages can linger in the set briefly, so matching specific members first leaves the UI stuck on "Thinking…" after a reply finishes. |
For a complete, runnable wiring of these APIs (HTML + event handlers + state UI), see the main.js in Minimum runnable example above.
Voice chat (optional)
To let the user talk to the avatar via microphone instead of typing, use the SDK's built-in STT pipeline. Transcribed speech is automatically routed through processChat, so the reply flows through the same subscribeChatLog callback as text input.
| Method | Purpose |
|---|---|
session.startProcessSTT() | Start capturing microphone input and transcribe it. Triggers the browser mic permission prompt on first use. |
session.stopProcessSTT() | Stop capture. |
Use ChatState.RECORDING in subscribeChatStates to render a live "listening" indicator. A mic-toggle button wired to these two calls is shown in main.js of Minimum runnable example.
ChatState values
ChatState is an enum exported from perso-interactive-sdk-web/client. subscribeChatStates hands you a Set<ChatState> (multiple states can be active at once) so your UI can reflect exactly what the pipeline is doing:
| Value | Meaning |
|---|---|
(empty Set, states.size === 0) | Idle / ready — the pipeline has finished all work. Accept new input. |
ChatState.IDLE | Explicit idle marker (some SDK versions expose this alongside the empty set). |
ChatState.RECORDING | Microphone is capturing user speech (startProcessSTT is active). |
ChatState.ANALYZING | LLM is generating the response. |
ChatState.SPEAKING | TTS audio is playing and the avatar is animating. |
ChatState.TTS | TTS synthesis in progress (often overlaps with SPEAKING). |
ChatState.LLM | LLM streaming in progress. |
Important: always check
states.size === 0before the individual flags. Some stages (e.g.ANALYZING) can remain in the set briefly before the pipeline fully drains, so matching specific members first will leave your UI stuck on "Thinking…" after a reply has already finished.
Use these to enable/disable the Send button, show a spinner, or gate further user input.
Client Side
Warning: Using
createSessionIdon the client side is not recommended. This exposes your API KEY in the browser, making it vulnerable to theft. If your API KEY is compromised due to client-side implementation, the SDK provider assumes no responsibility. For security, please use server-side session creation instead.
1. Create Session ID + Create Session WebRTC
// Import from client subpath
import {
getAllSettings,
createSessionId,
createSession,
ChatTool,
ChatState,
} from "perso-interactive-sdk-web/client";
// 1. Initialize SDK
const apiServerUrl = "https://platform.perso.ai";
const apiKey = "YOUR API KEY";
// 2. Fetch available options (client-side: top-level helpers)
const settings = await getAllSettings(apiServerUrl, apiKey);
const selectedStyle = settings.modelStyles.find(s => s.platform_type === "webrtc");
const selectedPrompt = settings.prompts[0];
// 3. Create session id
const sessionId = await createSessionId(apiServerUrl, apiKey, {
using_stf_webrtc: true,
model_style: selectedStyle.name,
prompt: selectedPrompt.prompt_id,
llm_type: settings.llms[0].name,
tts_type: settings.ttsTypes[0].name,
stt_type: settings.sttTypes[0].name,
});
// 4. Create WebRTC Session (modern 5-arg signature)
const session = await createSession(
apiServerUrl,
sessionId,
chatbotWidth, // pixels (e.g. 1080)
chatbotHeight, // pixels (e.g. 1920)
clientTools ?? [],
);
// 5. Mount + converse using session.setSrc / processChat / subscribeChatLog
// exactly as shown in the Server Side "Sending messages and receiving replies" section.Tool Calling Example
Client-side tool calling allows the model to trigger application-specific actions.
A reference implementation can be found here:
🔗 Perso Interactive Web SDK Tool Calling→
Web SDK API Reference
For detailed API documentation, see this repository:
🔗 Perso Interactive Web SDK API Reference→Updated 21 days ago
Learn about Perso Interactive On-Device SDK in the next section.
