SRE Operations Engineer

net2source (n2s) • Canada

Remote

Apply

AI Summary

SRE Operations Engineer responsible for monitoring, triaging, and executing standard operational tasks across enterprise applications. Supports Kubernetes, APIs, WAF, databases, API gateways, Kafka, and multi-cloud environments. First line of defense for incident detection, troubleshooting, and escalation using runbooks and automation.

Key Highlights

Monitoring & Infrastructure

Incident Triage & Communication

Kubernetes Operations

Scripting & Automation

Networking & Security Troubleshooting

Key Responsibilities

Monitoring & Infrastructure

Runbook Execution

Incident Triage & Communication

Kubernetes Operations

Scripting & Automation

Networking & Security Troubleshooting

Technical Skills Required

Kubernetes APIs WAF databases API gateways Gloo Apigee Kafka AWS Azure GCP Grafana Datadog Splunk Prometheus AIOps tools Python Bash PowerShell SQL NoSQL ITSM tools ServiceNow Jira xMatters ELK Prometheus Grafana Splunk

Benefits & Perks

100% remote

2–5 years (or more) in IT operations, NOC, or SRE/DevOps roles

Strong understanding of Linux, networking, and Kubernetes fundamentals

Knowledge of cloud-ready applications and observability tools

Strong troubleshooting skills using structured methods

Nice to Have

Familiarity with AWS, Azure, or GCP cloud platforms

Basic SQL/NoSQL knowledge

Experience with ITSM tools such as ServiceNow, Jira, or xMatters

Exposure to observability tools (ELK, Prometheus, Grafana, Splunk)

Job Description

Title: SRE Operations Engineer (Canada)

Location: 100% Remote

Role Summary

L1 Site Reliability Engineer responsible for monitoring, triaging, and executing standard operational tasks across enterprise applications
Supports Kubernetes, APIs, WAF, databases, API gateways (Gloo, Apigee), Kafka, and multi-cloud environments (AWS/Azure/GCP)
First line of defense for incident detection, troubleshooting, and escalation using runbooks and automation

Key Responsibilities

Monitoring & Infrastructure
Monitor systems using Grafana, Datadog, Splunk, Prometheus, and AIOps tools
Detect anomalies and follow alert workflows for resolution or escalation
Validate Kubernetes issues using monitoring dashboards and logs
Runbook Execution
Follow predefined runbooks for incident resolution
Restart services, validate system health, and escalate when procedures fail
Ensure adherence to operational standards
Incident Triage & Communication

Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

Perform initial incident triage and severity classification
Collect logs, metrics, and system data for analysis
Communicate clearly with stakeholders and escalation teams
Kubernetes Operations
Use kubectl to inspect pods, deployments, and services
Validate service health and troubleshoot cluster-level issues
Scripting & Automation
Read and modify scripts in Python, Bash, or PowerShell
Support automation of repetitive operational tasks
Networking & Security Troubleshooting
Use tools like ping, curl, netstat, and traceroute
Identify DNS, firewall, WAF, or proxy-related issues
Documentation & Knowledge Management
Document incident resolution steps and system issues
Identify gaps in runbooks and suggest improvements

Preferred Skills

Familiarity with AWS, Azure, or GCP cloud platforms
Basic SQL/NoSQL knowledge (e.g., simple query validation like SELECT 1)

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

Experience with ITSM tools such as ServiceNow, Jira, or xMatters
Exposure to observability tools (ELK, Prometheus, Grafana, Splunk)
Understanding of AI-assisted operational support tools
Strong automation mindset and process optimization awareness

Qualifications

2–5 years (or more) in IT operations, NOC, or SRE/DevOps roles
Strong understanding of Linux, networking, and Kubernetes fundamentals
Knowledge of cloud-ready applications and observability tools
Strong troubleshooting skills using structured methods (5 Whys, Fishbone analysis)

Deliverables

Continuous monitoring of infrastructure, applications, dashboards, and logs
Execution of standardized runbooks for incidents and routine tasks
First-level incident triage and escalation to L2/L3 teams
Documentation of incidents, gaps, and automation opportunities
Clear communication during operational incidents
Support onboarding of applications into operations framework

Job Overview

Posted Date May 09, 2026

Employment Type Contract

Experience Level Mid-Senior level

Location Canada

Category Devops

Company net2source (n2s)

Mentioned Skills

Industries

Similar Jobs

Explore other opportunities that match your interests

Technical Founder-Figure for Ultra-Luxury Travel Platform

Devops

•

5h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

Remote People

Canada

Senior Delivery Manager for Analytics & Data Management

Devops

•

3d ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

ALOIS Solutions

Canada

Dynatrace SaaS Platform Engineer

Devops

•

3d ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

Cognizant

Canada

SRE Operations Engineer

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Technical Founder-Figure for Ultra-Luxury Travel Platform

Premium Job

Remote People

Senior Delivery Manager for Analytics & Data Management

ALOIS Solutions

Dynatrace SaaS Platform Engineer

Cognizant

Subscribe our newsletter