Troubleshooting¶
When things go sideways in the cold, start here.
Yeti won't start¶
Check the service status:
systemctl status yeti
journalctl -u yeti -n 50 --no-pager
Common causes:
| Symptom | Cause | Fix |
|---|---|---|
ERR_MODULE_NOT_FOUND |
Missing dependencies | Run npm ci in /opt/yeti |
EADDRINUSE |
Port 9384 already in use | Check for another Yeti process or change port in config |
EACCES on ~/.yeti/ |
Permission mismatch | Ensure the systemd user owns ~/.yeti/ |
| Crash loop | Bad config | Check ~/.yeti/config.json for syntax errors |
Jobs aren't running¶
Nothing is happening at all:
- Check
enabledJobsin your config. An empty array means no jobs run. This is the most common oversight. - Open the dashboard at
http://localhost:9384and verify jobs are listed and not paused.
A specific job isn't picking up work:
- Is the job in
enabledJobs? It must be listed by name. - Is the job paused? Check the dashboard or
pausedJobsin config. - Is the item in
skippedItems? Check the queue page. - Is the previous run still active? Jobs use skip-if-busy --- they won't queue up if the last run hasn't finished.
- Check the logs page (
/logs) filtered by that job name for errors.
Jobs run but produce no output:
- The AI process may be timing out. Check logs for timeout messages.
- Increase
claudeTimeoutMsorcopilotTimeoutMsif tasks are consistently hitting the 20-minute default.
Issues aren't being planned¶
issue-refiner isn't picking up an issue:
- Does the issue have the
Needs Refinementlabel? Regular issues require it. - Is
issue-refinerinenabledJobs? - Is the repo in
allowedRepos(or isallowedReposset tonullfor all repos)? - Is the issue in
skippedItems? - Check
/logsfor the issue-refiner job --- look for errors or "no work found" messages.
Plan quality is poor:
This is a repo configuration issue, not a Yeti issue. See the Optimization page. The short version: add a CLAUDE.md with build commands, architecture overview, and coding conventions.
PRs aren't being created¶
issue-worker runs but no PR appears:
- Tree-diff guard: The worker checks that its changes actually differ from the base branch. If Claude makes no effective changes (empty diff), no PR is created. Check logs for "no tree diff" messages.
- Duplicate PR guard: If an open PR for the same issue already exists, the worker skips it. Check for existing PRs.
- Claude errors: The Claude process may have failed. Check the run log for the specific issue.
PR is created but immediately fails CI:
- This is normal --- the ci-fixer will pick it up on its next cycle.
- If ci-fixer can't fix it, it'll file a
[ci-unrelated]issue if the failure isn't caused by the PR's changes.
CI fixer isn't working¶
ci-fixer runs but doesn't fix failures:
- The fixer classifies failures as "related" (caused by PR changes) or "unrelated" (pre-existing). Check the PR comments for its classification.
- For related failures, Claude reads the CI logs and attempts a fix. If the logs are too large or unclear, the fix may fail.
- For unrelated failures, the fixer files a
[ci-unrelated]issue rather than attempting a fix on the PR.
Merge conflicts aren't being resolved:
- The ci-fixer handles conflicts by merging the base branch and using Claude to resolve markers.
- If the conflict is too complex, Claude may produce an incorrect resolution. Review the commit.
Auto-merger won't merge¶
The auto-merger has strict criteria. A PR must match one of these categories:
Dependabot PRs:
- All checks must be passing. No exceptions.
Doc PRs (yeti/docs-*):
- Only
.mdoryeti/files can be changed. - Checks must pass OR no checks must be configured.
Issue PRs (yeti/issue-*):
- A valid LGTM comment must exist.
- The LGTM must be posted after the latest commit (stale approvals don't count).
If none of these apply, the PR requires manual merge.
Dashboard issues¶
Can't access the dashboard:
- Verify Yeti is running:
curl localhost:9384/health - If
authTokenis set, you need to log in at/login. - If accessing remotely, check firewall rules for port 9384.
Queue shows stale data:
- The queue is populated by the label scanner, which runs every 5 minutes (
queueScanIntervalMs). - If webhooks are configured, the queue cache updates in real-time when labels are added or removed on GitHub.
- Trigger a manual refresh by visiting
/queue--- the page fetches fresh data on load. - If labels were changed outside Yeti (manually on GitHub) and webhooks are not configured, wait for the next scan cycle.
Discord bot not responding¶
- Verify
discordBotTokenanddiscordChannelIdare set in config. - Check that the bot has the required intents: Guilds, GuildMessages, MessageContent.
- Verify your Discord user ID is in
discordAllowedUsers. - Check Yeti logs for Discord connection errors:
journalctl -u yeti | grep -i discord - Ensure the bot has permission to read and send messages in the configured channel.
Rate limiting¶
GitHub API rate limit errors:
Yeti uses a circuit breaker with a 60-second cooldown. When rate-limited, all jobs pause API calls for 60 seconds, then resume. This is automatic.
If you're hitting rate limits frequently:
- Increase job intervals to reduce API call frequency.
- Reduce the number of repos being scanned (
allowedRepos). - Check that
queueScanIntervalMsisn't set too low.
Auto-updater issues¶
Yeti isn't updating:
systemctl status yeti-updater.timer
systemctl status yeti-updater.service
journalctl -u yeti-updater -n 20 --no-pager
- The timer checks every 60 seconds.
- Updates require
ghCLI to be authenticated as the service user. - If a health check fails after update, the updater rolls back automatically.
Rolled back after update:
- Check
journalctl -u yeti-updaterfor the rollback reason. - Usually means the new version's health check (
GET /health) failed within the timeout window. - The previous version is restored and Yeti restarts.
Worktree cleanup¶
Worktrees are created at ~/.yeti/worktrees/ and cleaned up in finally blocks after each job run. On startup, Yeti also cleans up orphaned worktrees from prior crashes.
If worktrees are accumulating:
ls ~/.yeti/worktrees/*/*/
- Orphaned worktrees usually mean a job crashed without cleanup.
- Restarting Yeti will trigger crash recovery, which cleans them up.
- You can safely delete the contents of
~/.yeti/worktrees/while Yeti is stopped.
OAuth / GitHub sign-in¶
"Sign in with GitHub" button doesn't appear:
- All three config fields must be set:
githubAppClientId,githubAppClientSecret, andexternalUrl. - These fields require a restart --- changing them via the dashboard config editor won't take effect until you restart Yeti.
Callback URL mismatch error from GitHub:
- The callback URL registered in your GitHub App settings must exactly match
{externalUrl}/auth/callback. - Check for protocol mismatch (
httpvshttps), trailing slashes, and port numbers.
"Not an org member" error after signing in:
- OAuth checks that the GitHub user belongs to at least one organization in
githubOwners. - The GitHub API's org membership endpoint only works for actual organizations, not personal usernames. If
githubOwnerscontains only personal usernames, all OAuth logins will be denied. - The user must be a member of the org (not just a collaborator on individual repos).
externalUrl misconfiguration:
- Must start with
http://orhttps://. Trailing slashes are stripped automatically. - Must match the URL users actually access (what's in the browser address bar), not an internal hostname.
- If behind a reverse proxy, use the public-facing URL, not
localhost.
Session expired / forced re-login:
- Sessions last 24 hours. After that, users must sign in again.
- Changing
githubAppClientSecretinvalidates all existing sessions.
Webhooks not triggering¶
Jobs don't run immediately after labeling an issue:
- Is
webhookSecretset in config? The endpoint returns 404 without it. - Is the webhook active in your GitHub App settings? Check Settings > Developer settings > GitHub Apps > your app > Webhook.
- Are the right events subscribed? You need Issues and Check runs.
- Check GitHub's webhook delivery log (App settings > Advanced > Recent Deliveries) for failed deliveries or signature mismatches.
- Verify Yeti's external URL is reachable from GitHub. The webhook URL is auto-configured on startup.
Webhook events arrive but jobs are skipped:
- The job must be in
enabledJobs. Webhook events for disabled jobs are silently ignored. - If the job is already running, the event is skipped. Polling catches it on the next cycle.
- The repo must be in
allowedRepos(orallowedReposmust benull).
Getting more information¶
- Dashboard logs:
/logswith filtering by job name, status, and search text. - System logs:
journalctl -u yeti -ffor live tailing. - Log level: Set
logLevelin config to control verbosity. Values:debug(default),info,warn,error. Debug-level logs are suppressed when a higher level is set. Theerrorlevel always executes (it triggers notifications). - Health check:
curl localhost:9384/healthfor a quick alive check. - Status endpoint:
curl localhost:9384/statusfor detailed JSON with job schedules, uptime, queue state, and integration status.