# Lattice Network Bridge — Setup Guide

***

### Overview

Lattice uses a split architecture: your **Edge node** is where you write and run code in Jupyter, and your **Core cluster** is where heavy compute happens on GPUs via Ray. By default, both run on the same machine. This guide walks you through running them on separate machines so your laptop or workstation can orchestrate GPU workloads on a dedicated compute server.

```
Your Machine (Edge)                   GPU Server (Core)
┌──────────────────────┐              ┌──────────────────────────┐
│  Jupyter Notebook    │              │  Ray Cluster             │
│  (localhost:8889)    │    LAN       │  ├── ray-head (10001)    │
│                      │◄────────────►│  ├── ray-worker-1 (GPU)  │
│  ray.init(address=   │              │  ├── ray-worker-2 (GPU)  │
│   'ray://<IP>:10001')│              │  └── ray-worker-3 (GPU)  │
└──────────────────────┘              │                          │
                                      │  Core API (8000)         │
                                      │  Ray Dashboard (8265)    │
                                      └──────────────────────────┘
```

### Prerequisites

* Both machines on the same local network
* Docker and Docker Compose installed on both machines
* The [lattice-edge](https://github.com/lat-labs/lattice-edge) repo cloned on your local machine
* If you plan on using your own Core node, you will need access to Lattice-Core Docker images. Please contact ***<logan@latticelab.ai>*** if you want access to run your own Core node!&#x20;

***

### Step 1: Start the Core Cluster (GPU Server)

On the GPU server, start the Ray cluster and API:

```bash
cd core/docker
docker network create lattice-network  # if not already created
docker compose up -d
```

Verify it's running:

```bash
docker exec lattice-ray-head ray status
```

You should see the head node and workers listed with available GPUs and CPUs.

Note the GPU server's IP address on your local network:

```bash
hostname -I
# Example output: 192.168.0.146
```

### Step 2: Verify Network Connectivity

From your local machine (Edge), confirm you can reach the GPU server:

```bash
# Check basic connectivity
ping 192.168.0.146

# Check Ray client port is open
nc -zv 192.168.0.146 10001

# Check Core API is responding
curl http://192.168.0.146:8000/docs

# Check Ray Dashboard is accessible
curl -s http://192.168.0.146:8265 | head -1
```

All four should succeed. If port 10001 is unreachable, check that the Core's `docker-compose.yml` maps port 10001 to the host and that no firewall is blocking it.

### Step 3: Configure the Edge Node

In your `lattice-edge` directory, create a `.env` file:

```bash
cd lattice-edge
cp .env.example .env  # or create from scratch
```

Set the following variables, replacing the IP with your GPU server's address:

```dotenv
# Compute plane connection
LATTICE_RAY_ADDRESS=ray://192.168.0.146:10001
LATTICE_RAY_NAMESPACE=lattice
LATTICE_CORE_API_URL=http://192.168.0.146:8000

# Local settings
JUPYTER_PORT=8889
```

#### Important: Match the Ray Version

The Ray client on the Edge must match the Ray version on the Core cluster **exactly**. Check the Core version:

```bash
docker exec lattice-ray-head python3 -c "import ray; print(ray.__version__)"
```

Then install the matching version on the Edge:

```bash
pip install ray[client]==<version from above>
```

For example, if Core runs Ray 2.10.0:

```bash
pip install ray[client]==2.10.0
```

> **Why this matters:** Ray uses gRPC for client-server communication. If the versions don't match, the connection will silently time out with a `ConnectionError: ray client connection timeout` instead of giving a useful error message.

### Step 4: Start the Edge Node

```bash
cd lattice-edge
docker compose --profile edge up -d
```

Open Jupyter in your browser at <http://localhost:8889>.

### Step 5: Connect to the Compute Plane

In a new notebook cell, connect to the remote Ray cluster:

```python
import ray
import os

# Connect to the remote compute plane
address = os.environ.get('LATTICE_RAY_ADDRESS', 'ray://192.168.0.146:10001')
namespace = os.environ.get('LATTICE_RAY_NAMESPACE', 'lattice')

ray.init(address=address, namespace=namespace)

# Verify the connection
resources = ray.cluster_resources()
print(f"Connected to: {address}")
print(f"  CPUs:   {resources.get('CPU', 0)}")
print(f"  GPUs:   {resources.get('GPU', 0)}")
print(f"  Memory: {resources.get('memory', 0) / 1e9:.1f} GB")
print(f"  Nodes:  {len(ray.nodes())}")
```

Expected output (example for a 4-GPU server):

```
Connected to: ray://192.168.0.146:10001
  CPUs:   14.0
  GPUs:   4.0
  Memory: 212.0 GB
  Nodes:  4
```

### Step 6: Run a Test Job on Remote GPUs

Confirm that tasks execute on the GPU server, not your local machine:

```python
@ray.remote(num_gpus=1)
def gpu_check():
    import torch
    return {
        "cuda_available": torch.cuda.is_available(),
        "device_name": torch.cuda.get_device_name(0) if torch.cuda.is_available() else None,
        "hostname": __import__('socket').gethostname()
    }

result = ray.get(gpu_check.remote())
print(result)
```

Expected output:

```python
{
    "cuda_available": True,
    "device_name": "NVIDIA GeForce RTX 3090",  # or whatever GPU the server has
    "hostname": "ray-worker-1"                   # runs on compute, not your machine
}
```

The `hostname` should be a Ray worker container name (like `ray-worker-1`), confirming the task ran on the GPU server.

***

### Configuration Reference

#### Edge `.env` Variables

| Variable                | Description                     | Default                                      |
| ----------------------- | ------------------------------- | -------------------------------------------- |
| `LATTICE_RAY_ADDRESS`   | Ray client connection string    | `ray://ray-head:10001` (same-machine Docker) |
| `LATTICE_RAY_NAMESPACE` | Ray namespace for job isolation | `lattice`                                    |
| `LATTICE_CORE_API_URL`  | Core API base URL               | `http://ray-head:8000` (same-machine Docker) |
| `JUPYTER_PORT`          | Local port for Jupyter          | `8889`                                       |

#### Core Ports (GPU Server)

| Port  | Service            | Protocol |
| ----- | ------------------ | -------- |
| 10001 | Ray Client Server  | gRPC/TCP |
| 8000  | Lattice Core API   | HTTP     |
| 8265  | Ray Dashboard      | HTTP     |
| 6379  | Ray GCS (internal) | TCP      |

***

### Troubleshooting

#### `ConnectionError: ray client connection timeout`

This is the most common error. Work through these checks in order:

**1. Network reachability**

```bash
ping <gpu-server-ip>
nc -zv <gpu-server-ip> 10001
```

If ping fails, the machines aren't on the same network. If port 10001 is refused, the Ray head container may not be running or the port isn't mapped.

**2. Ray version mismatch** (most likely cause)

```bash
# On Edge
python3 -c "import ray; print(ray.__version__)"

# On Core
docker exec lattice-ray-head python3 -c "import ray; print(ray.__version__)"
```

These **must match exactly**. Install the matching version on the Edge:

```bash
pip install ray[client]==<core-version>
```

**3. Ray client server not started**

```bash
docker exec lattice-ray-head ss -tlnp | grep 10001
```

If nothing is listening on 10001 inside the container, the Ray head's `start.sh` may not include `--ray-client-server-port=10001`. Check:

```bash
docker exec lattice-ray-head cat /app/start.sh
```

#### `ModuleNotFoundError` in remote tasks

The Ray workers need the same Python packages your task uses. If your `@ray.remote` function imports a package that isn't installed on the workers, it will fail. Either:

* Add the package to `core/docker/ray-worker/requirements.txt` and rebuild
* Use Ray's `runtime_env` to install packages on the fly:

  ```python
  @ray.remote(num_gpus=1, runtime_env={"pip": ["transformers"]})
  def my_task():
      ...
  ```

#### Can't access Ray Dashboard

Open `http://<gpu-server-ip>:8265` in your browser. If it doesn't load, verify the container is running and port 8265 is mapped:

```bash
docker ps | grep ray-head
```

#### Jobs run but can't read/write files

When Edge and Core are on separate machines, they don't share a filesystem. Files on your laptop aren't visible to Ray workers. For now, pass data directly through Ray objects:

```python
# Instead of reading a file path on the worker:
@ray.remote(num_gpus=1)
def process(data):
    # data is passed through Ray's object store, no filesystem needed
    return do_something(data)

data = open("local_file.csv").read()
result = ray.get(process.remote(data))
```

For larger datasets, shared storage (NFS or S3) will be added in a future update.

***

### Switching Between Modes

#### Single-machine mode (default)

Remove or comment out the `.env` file, or set defaults:

```dotenv
LATTICE_RAY_ADDRESS=ray://ray-head:10001
LATTICE_CORE_API_URL=http://ray-head:8000
```

Both Edge and Core must be on the same Docker network (`lattice-network`).

#### Two-machine mode

Set the `.env` to point at the GPU server's IP:

```dotenv
LATTICE_RAY_ADDRESS=ray://192.168.0.146:10001
LATTICE_CORE_API_URL=http://192.168.0.146:8000
```

No code changes needed — the same notebooks work in both modes.

***

### What's Next

* **Shared artifact storage** — NFS or S3 mount so Edge and Core can share files
* **Multi-user support** — Multiple Edge nodes connecting to a single Core cluster with job isolation
* **Remote access via Cloudflare Tunnel** — Connect to the Core cluster from anywhere, not just the LAN
* **Resource quotas** — Prevent one Edge node from consuming all GPUs on the cluster


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://lattice-1.gitbook.io/lattice-docs/documentation/readme/lattice-network-bridge-setup-guide.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
