Skip to main content

What are Satellites?

Formal Satellites are optional specialized containers you deploy alongside Connectors to enable advanced capabilities like PII detection, schema discovery, and custom policy data loading. Satellites extend Formal’s core functionality without adding complexity to the Connector itself. They’re deployed in your infrastructure and communicate with Connectors and the Control Plane.

Satellite Types

AI

Detects PII and PHI and sensitive data in real-time for automatic redaction and classification Enables realt-time threat detection and mitigation for SSH and Kubernetes sessions

Data Discovery

Catalogs database schemas, tables, and columns across your data infrastructure

Policy Data Loader

Loads external data into policies using custom code in Python or Node.js

AI Satellite

Identifies Personally Identifiable Information (PII) and Protected Health Information (PHI) in database responses, enabling automatic data masking and classification policies. Also enables real-time threat detection and mitigation for SSH and Kubernetes sessions.

Features

  • Real-time PII/PHI detection on query responses
  • Threat detection and mitigation for SSH and Kubernetes sessions
  • Automatic labeling of columns and fields with PII/PHI types
  • Integration with policies for conditional masking
  • GPU acceleration for high-throughput processing (optional, recommended for production)

Configuration

Required Environment Variables:
  • FORMAL_CONTROL_PLANE_API_KEY: Satellite authentication token
GPU Support: The AI Satellite can leverage NVIDIA GPUs for improved performance. Use --gpus all to enable GPU acceleration:
# Run with GPU support (necessary to parallelize across GPUs)
docker run -d \
  --gpus all \
  -e FORMAL_CONTROL_PLANE_API_KEY=<your-api-key> \
  654654333078.dkr.ecr.us-west-2.amazonaws.com/formalco-prod-data-classifier:latest
The satellite automatically detects and configures available GPUs for optimal performance.

Data Discovery Satellite

Automatically discovers and catalogs your database schemas, tables, columns, and relationships.

Features

  • Scheduled schema discovery across all resources
  • PII/PHI classification integration (requires AI Satellite)
  • Schema change tracking with deletion policies

Configuration

Environment variables:
  • FORMAL_CONTROL_PLANE_API_KEY: Satellite authentication token
  • DATA_CLASSIFIER_SATELLITE_URI: URI of AI Satellite (e.g., localhost:50055)

Schema Discovery Jobs

Configure discovery schedules per resource:
  • Frequency: None, every 6/12/18/24 hours, or custom cron
  • Deletion policy: Mark for deletion or auto-delete removed schemas
  • Native user: Which credentials to use for discovery

Policy Data Loader Satellite

Enables custom code to load data from external sources into your policies, extending policy evaluation with dynamic business logic.

Features

  • Custom code execution in Python 3.11 or Node.js 18
  • Scheduled runs with cron expressions
  • External API calls to fetch data
  • JSON output accessible in policies via data object

Supported Runtimes

RuntimeIdentifierAvailable Libraries
Python 3.11python3.11requests, httpx
Node.js 18nodejs18.xlodash, axios

Example: Load Zendesk Tickets for Contextual Data

This example fetches open Zendesk tickets and enriches them with user information for use in policies:
import asyncio
import json
import os

import httpx

ZENDESK_SUBDOMAIN = os.environ["ZENDESK_SUBDOMAIN"]
ZENDESK_EMAIL = os.environ["ZENDESK_EMAIL"]
ZENDESK_API_TOKEN = os.environ["ZENDESK_API_TOKEN"]

auth = httpx.BasicAuth(ZENDESK_EMAIL, ZENDESK_API_TOKEN)
base_url = f"https://{ZENDESK_SUBDOMAIN}.zendesk.com"

users_cache: dict[str, dict | None] = {}

async def get_user(user_id) -> dict | None:
    if user := users_cache.get(user_id):
        return user

    async with httpx.AsyncClient(base_url=base_url, auth=auth) as client:
        response = await client.get(f"/api/v2/users/{user_id}.json")

    response.raise_for_status()
    user = response.json()["user"]
    users_cache[user_id] = user
    return user

async def get_tickets() -> list[dict]:
    async with httpx.AsyncClient(base_url=base_url, auth=auth) as client:
        params = {"query": "status:new status:open status:pending"}
        response = await client.get("/api/v2/search.json", params=params)

    response.raise_for_status()
    return response.json()["results"]

async def main():
    tickets = await get_tickets()

    for ticket in tickets:
        if requester_id := ticket.get("requester_id"):
            ticket["requester"] = await get_user(requester_id)
        if submitter_id := ticket.get("submitter_id"):
            ticket["submitter"] = await get_user(submitter_id)
        if assignee_id := ticket.get("assignee_id"):
            ticket["assignee"] = await get_user(assignee_id)

        ticket["url"] = f"{base_url}/agent/tickets/{ticket['id']}"

    print(json.dumps(tickets))

asyncio.run(main())

Using in Policies

The Policy Data Loader outputs JSON data that becomes available in policies via the data object. Here’s how to use the Zendesk tickets data in a policy:
package formal.v2

import future.keywords.if

default post_request := {"action": "filter", "reason": "No tickets are open for the given row"}

post_request := {"action": "allow", "contextual_data": filteredTickets, "reason": 
  "At least one ticket is open for the given row"} if {
    col := input.row[_]
    col["path"] == "postgres.public.pii.email"
    filteredTickets := [obj | obj := data.zendesk_tickets[_]
    obj.requester.email == col.value]
    count(filteredTickets) > 0
}
This policy:
  • Filters database rows by checking if there are open Zendesk tickets for the email address
  • Allows access with contextual ticket data when tickets exist
  • Blocks access when no tickets are found for the email

Schedule Format

Policy Data Loaders use second-based cron expressions:
ExpressionDescription
* * * * * *Every second
*/30 * * * * *Every 30 seconds
0 * * * * *Every minute
0 */5 * * * *Every 5 minutes
0 30 8 * * *Daily at 8:30 AM
Format: second minute hour day month year

Configuration

Environment variables:
  • FORMAL_CONTROL_PLANE_API_KEY: Satellite authentication token
  • Custom variables: Available to your code
The Satellite passes all its environment variables to worker processes, so you can use environment variables in your code (e.g., API keys, endpoints).

Deployment

Satellites are Docker containers deployed in your infrastructure, similar to Connectors.

Prerequisites

  1. Create the Satellite in the Formal console
  2. Copy the API token
  3. Deploy the container with appropriate environment variables
  • AWS ECS Fargate
  • Kubernetes
  • Docker
See the AWS Satellite deployment example for Terraform configuration.
By default, the satellite communicates with the Connectors using a TLS certificate issued by the Control Plane.

Spaces and Satellites

Like Connectors and Resources, Satellites can be assigned to Spaces:
  • Satellite with a Space: Only communicates with Connectors and Resources in the same Space
  • Satellite without a Space: Can communicate with any Connector or Resource
Changing a Satellite’s Space requires restarting the Satellite container.

Managing Satellites

Creating a Satellite

1

Navigate to Satellites

Go to Satellites in the console
2

Select type

Choose AI Satellite, Data Discovery, or Policy Data Loader
3

Configure settings

  • Name: Friendly identifier - Space: (Optional) Logical grouping
4

Copy API token

Save the token for deployment
5

Deploy container

Use the token in your deployment (ECS, Kubernetes, Docker)

Policy Data Loader Status

  • Draft: Not running; code is being edited
  • Active: Running and loading data on schedule
Activate after testing your code to make data available to policies.

Next Steps