Skip to content

Troubleshooting Guide

This guide helps you diagnose and resolve common issues with the NexusAI Deployer.


Application Issues

Application Won't Start

Symptoms: - Application doesn't open - Splash screen appears then closes - Error message on startup

Solutions:

  1. Check system requirements
  2. Windows 10/11, macOS 10.15+, or Linux
  3. 4GB RAM minimum
  4. 500MB disk space

  5. Run as administrator (Windows)

  6. Right-click → Run as administrator

  7. Check antivirus

  8. Add application to exclusions
  9. Temporarily disable and test

  10. Reinstall application

  11. Uninstall completely
  12. Download fresh installer
  13. Install again

"Failed to Load Capabilities"

Symptoms: - Capability list is empty - Error message about backend connection

Expected View:

Capability Selection

Solutions:

  1. Check network connection
  2. Verify internet access
  3. Check firewall settings

  4. Restart application

  5. Close completely
  6. Wait 10 seconds
  7. Reopen

  8. Check proxy settings

  9. Configure system proxy
  10. Set HTTP_PROXY environment variable

Application Freezes

Symptoms: - UI becomes unresponsive - "Not Responding" in title bar

Solutions:

  1. Wait for operation to complete
  2. Some operations take time
  3. Check progress indicators

  4. Force close and restart

  5. Task Manager → End Task (Windows)
  6. Force Quit (macOS)

  7. Check system resources

  8. Close other applications
  9. Free up memory

Authentication Issues

"Access Denied" Error

Symptoms: - Credential validation fails - "Access Denied" message

Where This Occurs:

AWS Configuration

Solutions:

  1. Verify credentials
  2. Check Access Key ID is correct
  3. Verify Secret Access Key
  4. Ensure no extra spaces

  5. Check IAM permissions

  6. Verify policy is attached
  7. Check for explicit denies
  8. Review permission boundaries

  9. Check account status

  10. Ensure account is active
  11. Verify no billing issues

"Invalid Role ARN"

Symptoms: - Role assumption fails - "Invalid ARN" error

Solutions:

  1. Verify ARN format

    arn:aws:iam::123456789012:role/RoleName
    

  2. Check role exists

  3. Open IAM Console
  4. Verify role is present

  5. Check trust policy

  6. Role must trust your source credentials
  7. Verify principal is correct

"Session Expired"

Symptoms: - Operations fail mid-deployment - "Credentials expired" error

Solutions:

  1. Re-authenticate
  2. Go back to AWS Configuration
  3. Enter credentials again
  4. Validate before continuing

  5. For ADFS users

  6. Sessions expire after 1 hour
  7. Re-authenticate if deployment is long

  8. Use longer session duration

  9. Configure role with longer max session
  10. Up to 12 hours for IAM roles

"MFA Required"

Symptoms: - Authentication fails - "MFA token required" error

Solutions:

  1. Use MFA-enabled authentication
  2. Get MFA code from authenticator
  3. Enter when prompted

  4. Use role assumption

  5. Create role without MFA requirement
  6. Assume role from MFA-protected user

Permission Verification Issues

Verification Checks Failing

What You'll See:

Permission Verification

Status Indicators: - ⏳ Checking - Verification in progress - ✅ Passed - Check successful - ❌ Failed - Check failed (see details) - ⚠️ Warning - Non-blocking issue

Common Verification Failures

Check Common Cause Solution
AWS Account Access Invalid credentials Re-enter credentials
IAM Permissions Missing policy Attach required policy
Frontend Artifact S3 access denied Check S3 permissions
Backend Image ECR access denied Check ECR permissions
Network Resources VPC limits Request limit increase
Service Limits ECS quota exceeded Request quota increase

Handling Failures

  1. Review the error message
  2. Click "View Details" for more information
  3. Fix the issue in AWS Console
  4. Click Re-verify to check again

Deployment Issues

Deployment Stuck at 0%

Symptoms: - Progress bar doesn't move - No logs appearing

Expected View:

Deployment Progress

Solutions:

  1. Check network connectivity
  2. Verify AWS API access
  3. Check firewall rules

  4. Verify credentials are valid

  5. Go back and re-validate
  6. Check session hasn't expired

  7. Check CloudFormation in AWS Console

  8. May show more details
  9. Look for stack events

"Stack Already Exists"

Symptoms: - Deployment fails immediately - "Stack already exists" error

Solutions:

  1. Use Update instead of Install
  2. Go to Capability Selection
  3. Click Update on existing capability

  4. Delete existing stack first

  5. Use Delete action
  6. Then try Install again

  7. Use Clean Deploy strategy

  8. Automatically handles existing stacks

"Resource Limit Exceeded"

Symptoms: - Deployment fails - "Limit exceeded" error

Solutions:

  1. Check service quotas
  2. AWS Console → Service Quotas
  3. Request increase if needed

  4. Common limits:

  5. VPCs per region: 5
  6. Elastic IPs: 5
  7. ECS clusters: 10000
  8. S3 buckets: 100

  9. Clean up unused resources

  10. Delete old deployments
  11. Remove orphaned resources

"Certificate Validation Failed"

Symptoms: - Deployment stuck at certificate step - "Pending validation" status

Solutions:

  1. Check Route53 hosted zone
  2. Verify zone ID is correct
  3. Ensure domain matches

  4. Check DNS records

  5. CNAME records should be created
  6. May take up to 30 minutes

  7. Manual validation

  8. Go to ACM Console
  9. Check validation status
  10. Create DNS records manually if needed

"ECS Tasks Not Starting"

Symptoms: - Deployment completes but service unhealthy - Tasks keep stopping

Solutions:

  1. Check container logs

    aws logs tail /ecs/capability-prod --follow
    

  2. Verify container image

  3. Check ECR repository
  4. Ensure image exists and is accessible

  5. Check task definition

  6. Memory/CPU settings
  7. Environment variables
  8. Health check configuration

  9. Check security groups

  10. Allow inbound on health check port
  11. Allow outbound to required services

Post-Deployment Issues

Health Checks Failing

What You'll See:

Post Verification

Status Indicators: - ✅ Healthy - Service is operational - ⚠️ Degraded - Service has issues - ❌ Unhealthy - Service is down

Solutions:

  1. Wait and retry
  2. Services may still be starting
  3. Click "Re-verify"

  4. Check individual services

  5. Open URLs directly
  6. Check for specific errors

  7. Review CloudWatch logs

  8. Application errors
  9. Connection issues

Application Not Accessible

Symptoms: - Frontend URL returns error - "Site can't be reached"

Solutions:

  1. Wait for DNS propagation
  2. Can take up to 48 hours
  3. Usually 5-15 minutes

  4. Check CloudFront distribution

  5. Status should be "Deployed"
  6. Check origin configuration

  7. Verify S3 bucket

  8. Files should be uploaded
  9. Bucket policy allows CloudFront

"502 Bad Gateway"

Symptoms: - Application loads but shows 502 - API calls fail

Solutions:

  1. Check ECS service
  2. Tasks should be running
  3. Health checks passing

  4. Check ALB target group

  5. Targets should be healthy
  6. Check health check path

  7. Check security groups

  8. ALB can reach ECS tasks
  9. Correct ports are open

Login Not Working

Symptoms: - Can't log in with admin credentials - "Incorrect username or password"

Check Your Credentials:

Deployment Results

Solutions:

  1. Verify credentials
  2. Check Results screen for correct values
  3. Credentials are case-sensitive

  4. Check Cognito User Pool

  5. User should exist
  6. Status should be "Confirmed"

  7. Reset password

  8. Use Cognito Console
  9. Admin → Reset password

Recovery Procedures

Recovering from Failed Deployment

  1. Review error message
  2. Note the specific error
  3. Check which stack failed

  4. Check CloudFormation Console

  5. View stack events
  6. Find root cause

  7. Choose recovery strategy

  8. Retry: If transient error
  9. Clean Deploy: If stack is corrupted
  10. Manual fix: If specific resource issue

Recovering from DELETE_FAILED

  1. Identify stuck resources

    aws cloudformation describe-stack-events \
      --stack-name capability-prod-backend \
      --query 'StackEvents[?ResourceStatus==`DELETE_FAILED`]'
    

  2. Manually delete resources

  3. Disable deletion protection
  4. Empty S3 buckets
  5. Remove dependencies

  6. Retry deletion

    aws cloudformation delete-stack \
      --stack-name capability-prod-backend
    

Recovering from UPDATE_ROLLBACK_COMPLETE

  1. Review what caused failure
  2. Check CloudFormation events
  3. Identify problematic change

  4. Options:

  5. Fix and retry: If you can fix the issue
  6. Clean Deploy: If stack is too corrupted
  7. Rollback manually: Revert configuration

Getting Help

Information to Collect

When contacting support, provide:

  1. Deployment ID from the installer
  2. Error messages (screenshots or text)
  3. Deployment logs (download from installer)

Deployment Logs

  1. CloudFormation events (from AWS Console)
  2. Application version
  3. Operating system

Log Locations

Application logs: - Windows: %APPDATA%\NexusAI Installer\logs\ - macOS: ~/Library/Logs/NexusAI Installer/ - Linux: ~/.config/NexusAI Installer/logs/

AWS logs: - CloudFormation: AWS Console → CloudFormation → Events - ECS: CloudWatch → Log groups → /ecs/capability- - Application: CloudWatch → Log groups → /aws/application/

Support Channels

  • Documentation: This user guide
  • Support Portal: support.nexus.ai
  • Email: support@nexus.ai

Common Error Messages

Error Cause Solution
"Access Denied" Missing permissions Check IAM policy
"Stack already exists" Previous deployment Use Update or Delete first
"Resource limit exceeded" AWS quota reached Request limit increase
"Certificate validation failed" DNS not configured Check Route53
"Task failed to start" Container issue Check ECS logs
"502 Bad Gateway" Backend not responding Check ECS service health
"Session expired" Credentials timed out Re-authenticate
"Invalid ARN" Malformed role ARN Check ARN format

Next Steps