Capabilities
What an agent can actually do: tools, skills, browser use, sub-agents, document generation, voice and image.
Tools
Tools are the things the agent can do, as opposed to things it knows. Every agent ships with a built-in toolbelt; you can add custom tools or scope which built-ins each agent has.
Built-in tools
- Shell: run shell commands inside the agent’s container.
- File IO: read, write, edit, search files inside the container.
- Browser: navigate, click, type, screenshot. See Browser use.
- Sub-agent spawn: launch a scoped agent for a subtask.
- Document generation: render markdown into publication-quality PDFs. See Document generation.
- Voice: speech-to-text and text-to-speech for phone-call and iMessage triggers.
- Image: generate and edit images.
- Memory read/write: access the agent’s own memory store. Always on.
- HTTP fetch: arbitrary GET/POST against public APIs.
- Schedule: create or update tasks on the agent itself (so the agent can put work on its own calendar).
Scoping
On the agent’s Settings tab, the Tools section lists every built-in with a toggle. Disable the ones the agent shouldn’t have. A research agent probably doesn’t need shell; a code-review agent probably doesn’t need voice.
Custom tools
Add custom tools from Dashboard → Tools. Each tool is a name, a JSON-schema input, and an endpoint URL we POST to when the agent invokes it. Returns are fed back into the model on the next turn.
{
"name": "lookup_customer",
"description": "Look up a customer by email.",
"input_schema": {
"type": "object",
"properties": {
"email": { "type": "string", "format": "email" }
},
"required": ["email"]
},
"endpoint": "https://api.acme.com/agents/lookup-customer",
"headers": {
"X-Api-Key": "{{ACME_KEY}}"
}
}Secrets in headers ({{NAME}}) reference encrypted-at-rest variables you set on the Tools page. They never appear in transcripts or logs.
Skills
A skill is a reusable playbook the agent can invoke by name. Think of it as a saved prompt with structure: a frontmatter block (name, when-to-use, allowed tools) and a markdown body that walks the agent through the procedure.
Anatomy of a skill
--- name: weekly-retro description: Generate a weekly engineering retrospective allowed_tools: - shell - file_io --- # Steps 1. Run `git log --since="7 days ago"` and group commits by author. 2. Score each PR for risk and impact. 3. Surface anything that broke and got reverted. 4. Write the retro to RETRO.md.
Invoking a skill
In chat, the user (or the agent itself) can write /weekly-retro and the agent loads that skill and follows it. Skills compose: one skill can invoke another.
Authoring skills
Open Dashboard → Skills and click + New skill. The editor validates the frontmatter as you type and previews the rendered markdown. Save and the agent can invoke it immediately, no restart required.
Sharing skills
Skills are scoped to your account by default. Export a skill as a single markdown file (the editor has an Export button); another user can import it from + Import skill and run it themselves.
The skill library
We ship a small library of pre-built skills (research, retrospective, code review, daily digest, …) you can install with one click. Browse them from the Skills page header.
Browser use
Backbend ships a patched Chromium for agent browsing. Not a headless lib that gets blocked at the first Cloudflare challenge.
What’s included
- Stealth. No
navigator.webdriver, no headless fingerprints, realistic timing on input. - Identity isolation.Each browser session gets its own profile so cookies and local storage don’t leak between accounts.
- LLM-friendly DOM. A simplified DOM tree the model can read without choking on 5 MB of nested
<div>s. - Anti-bot evasion. The same techniques real users have: mouse jitter, scroll patterns, viewport variation.
Tools the browser exposes
navigate(url): open a URL in the agent tab.click(selector)/type(selector, text): standard interactions.read_page(): structured page contents.screenshot(): annotated screenshots for evidence.read_console()/read_network(): debug data when a page misbehaves.fill_form(map): fill multiple fields in one shot.wait_for(selector|condition): pause until a condition is true.
Combining with sessions
For sites OAuth doesn’t cover, see Persistent browser sessions: the agent reuses your saved login on every browser tool call.
Watching the agent browse
On the agent detail page, the Browser tab shows a live view of the agent’s tab while it’s using it: you see the page the agent sees, the actions it takes, the network it triggers. Useful for debugging or just watching the work happen.
Sub-agents
Sub-agents are scoped agents the main agent (or you) can spin up for a specific job. They have their own memory, their own tools, and they return a single answer to the parent. Not their whole transcript.
Why bother
Two reasons. First, context cleanliness: a research dive that reads 30 pages shouldn’t bloat the main agent’s context window. Second, parallelism: launch three sub-agents to investigate three things at once.
Lifecycle
Sub-agents persist until you delete them; they’re not ephemeral. So a sub-agent you create today for “watch competitor X pricing” keeps running on whatever schedule you give it.
Creating a sub-agent
Two paths:
- From the dashboard:
+ New agent, same as any agent, then mark it as a sub-agent of another in Settings. - From the main agent in chat:ask it to “create a sub-agent called ‘research-x’ that does Y” and it will provision one for you, using the sub-agent-spawn tool.
Communication
The parent agent talks to sub-agents through a dedicated tool call. The sub-agent gets the request, runs to completion, and returns a single message back to the parent. The parent never sees the sub-agent’s internal scratch work; only the final answer.
Document generation
Agents can produce publication-quality documents (PDFs, slides, long-form reports) directly from a markdown source. The output is the same format you’d hand-craft in a print tool: proper margins, page numbers, running headers, curly quotes, clickable TOC.
Use cases
- Weekly engineering retrospectives shipped as a polished PDF.
- Investor updates auto-generated from the agent’s memory and metrics.
- Customer-facing reports drawn from a database query.
How it works
The agent calls the make_documenttool with a markdown body and a config (page size, draft watermark, cover page, …). The tool renders the PDF, uploads it to your account’s document store, and returns a URL. You can share the URL externally or pull the file from Dashboard → Agents → name → Files.
Voice and image
Voice
Voice tools cover both directions: speech-to-text (used by the phone-call trigger to transcribe what the caller said) and text-to-speech (used to reply on a phone call or read out a message). The TTS voice is configurable per agent.
Image generation
Agents can generate images via the configured LLM provider’s image API. The generated image is stored in the agent’s Files area and returned as a URL the agent can hand back to you or embed in a document.
Image understanding
Any model that supports vision (Claude 3+, GPT-4o, Gemini) accepts images as input. Drop an image into chat or pass it to a task and the agent treats it as part of the conversation.