Senior Site Reliability Engineer - CTJ - Poly

Location: Redmond
Posted on: June 23, 2025

Job Description:

Are you interested in shaping the future of Microsoft 365 products that empower our customers to seamlessly create, collaborate, and share within government cloud environments? In this role, you will leverage your expertise in software development, online services, and AI to envision, design, and improve upon next-generation Microsoft 365 government cloud service offerings. The Site Reliability Engineering (SRE) team provides leadership, direction and accountability for application architecture, system design, and end-to-end implementation. As a Senior Site Reliability Engineering , you will identify and deliver software improvements using your expertise in software development, AI, complexity analysis, and scalable system design. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. Qualifications Required/Minimum Qualifications: 6 years technical experience in software engineering, network engineering, or systems administration OR Bachelors Degree in Computer Science, Information Technology, or related field AND 3 years technical experience in software engineering, network engineering, or systems administration OR Masters Degree in Computer Science, Information Technology, or related field AND 2 years technical experience in software engineering, network engineering, or systems administration. Other Requirements: Security Clearance Requirements: Candidates must be able to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Candidates must have an active TS and be willing to upgrade to TS/SCI (with polygraph) or have an active TS/SCI and be willing to upgrade to TS/SCI (with polygraph). This role will require candidates to maintain the TS/SCI (with polygraph) clearance. Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. Failure to maintain or obtain the appropriate clearance and/or customer screening requirements may result in employment action up to and including termination. Clearance Verification : This position requires successful verification of the stated security clearance to meet federal government customer requirements. You will be asked to provide clearance verification information prior to an offer of employment. Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter. Citizenship & Citizenship Verification: This position requires verification of U.S. citizenship due to citizenship-based legal restrictions. Specifically, this position supports United States federal, state, and/or local United States government agency customer and is subject to certain citizenship-based restrictions where required or permitted by applicable law. To meet this legal requirement, citizenship will be verified via a valid passport, or other approved documents, or verified US government Clearance Preferred/Additional Qualifications: 7 years technical experience in software engineering, network engineering, or systems administration OR Bachelors Degree in Computer Science, Information Technology, or related field AND 4 years technical experience in software engineering, network engineering, or systems administration OR Masters Degree in Computer Science, Information Technology, or related field AND 3 years technical experience in software engineering, network engineering, or systems administration OR Doctorate Degree in Computer Science, Information Technology, or related field. Site Reliability Engineering IC4 - The typical base pay range for this role across the U.S. is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year. Microsoft will accept applications for the role until June 26, 2025 Responsibilities Technical Knowledge and Domain-Specific Expertise Demonstrates end-to-end expertise in distributed systems design, interactions between cloud technology layers and components, functions of physical network devices, and dependencies at scale. Drives efforts within an organization to identify and recommend optimal configurations of cloud technology solutions and develops or modifies the code base that defines infrastructures to improve the reliability and operability of supported products. Develops end-to-end technical expertise in the architecture, code, features, and operations of specific products as required to implement improvements in product availability, reliability, efficiency, observability, and/or performance. Drives code/design reviews with the engineering teams that develop and/or manage those products and shares learnings and recommendations across engineering teams working on related products within their organization. Researches and maintains deep knowledge of industry trends as well as advances in large-scale distributed systems and cloud technologies; identifies opportunities to create, implement, and/or optimally utilize new tools, technologies, and/or processes to solve ambiguous problems and improve product availability, reliability, efficiency, observability, and/or performance. Drives the adoption of new solutions across engineering teams working with related products within an organization and provides guidance and coaching to others on relevant topics. Contributions to Development and Design Leverages technical expertise in the infrastructure of large scale distributed systems and specific products, as well as objective insights drawn from analyses of production telemetry data to advocate for, or directly contribute to, changes to the code base to improve the availability, reliability, efficiency, observability, and performance of related sets of products developed and supported by teams within an organization. Develops, tests, and implements changes to optimize code and improve the observability, reliability and operability of platforms, systems, and products at scale. Reviews the effect of these changes to document and share development insights within their team. Engages with product engineering teams within an organization by driving code/design reviews, hosting regular meetings, and participating in on-call rotations and incident responses throughout product development and operations cycles; leverages end-to-end technical expertise on underlying systems/platforms and insights from engagements with product engineering teams and telemetry analyses to propose scalable improvements in code and designs with attention to customer/business objectives and incident prevention. Driving Operational Excellence Develops code, scripts, systems, or platforms that automate moderately complex but repetitive operations processes (e.g., monitoring, alerting, deploying products and updates, debugging) at scale; reviews existing automation code and scripts to evaluate reusability, extendibility, and scalability within an organization. Leverages end-to-end technical expertise and telemetry analysis to identify patterns and opportunities to implement configuration and data changes for related sets of platforms, systems, or products in production using code, tooling, and automation; identifies cases where teams lack the tools and/or capability to manage platforms, systems, or products using code and drives efforts within an organization to expand capabilities and/or tooling accordingly. Leverages existing tools and automation to enable product engineering teams within their organization to increase the velocity in which they can reliably and safely implement changes in production; monitors the effects of changes across platforms or systems. Analyzes data from telemetry pipelines and monitoring tools that detail operations metrics (e.g., availability, reliability, performance, efficiency) of systems, platforms, or products operating at scale. Contributes to the development of new tooling and/or predictive models to identify and test potential improvements in product development and/or operations, and monitors the impact of changes on operations metrics (e.g., Time-to-X) within an organization. Identifies optimal uses for existing tools and/or models to identify contributing factors or points of failure that are affecting the availability, reliability, performance, and/or efficiency of systems, platforms, or products; proposes and implements solutions that resolve root cause(s) and prevent issues from occurring in related products by working with product engineering teams within an organization to test and deploy them to production. Responds to incidents during regular on-call rotations by identifying the level of impact, troubleshooting complex issues, and deploying appropriate fixes to resolve root cause(s); alerts product teams, owners, and leadership to issues with major customer/business impact and escalates resolution of the highly complex, ambiguous, and impactful issues to include other engineering teams and/or subject matter experts as needed. Shares details related to incidents and their resolution through post-mortem reports and during regular review meetings. Develops, maintains, and leverages capacity planning models and monitoring tools to forecast product capacity and resource demands; models the predicted effect of changes to capacity plans to optimize code bases to better manage resources in respond to dynamic capacity demands. May contribute to the development of automated resource utilization tools or processes that can dynamically scale compute resources up or down to adjust to capacity demands. Draws insights from performance and resource monitoring across products within their organization to identify whether there is a need to optimize code, infrastructure, or architecture - or if changes to compute resources are required; uses advanced models to forecast and verify the efficacy of changes at scale and proposes solutions that are aligned with customer/business needs. Shares insights and best practices that can be applied to improve development and operations across related sets of systems, platforms, and/or products. Continues to develop their understanding of insights and best practices through interactions with more experienced SREs and members of product engineering teams. Mentors and coaches more engineers to help them identify and propose relevant solutions. Additional Responsibilities Design, develop, and deliver engineering solutions that serve and protect M365 government clouds. Own deployment, availability, reliability, performance and customer escalation targets for sovereign environments. Proactively identify and reduce issues through design, testing, and implementation of software-based solutions. Collaborate with Engineering and Program Management partners to translate customer, business, and technical requirements into architectural designs and feature releases. Drive efficiencies through software improvement and root cause analysis resulting in service delivery, maturity, and scalability. Develop, test, and implement changes to optimize code and improve platforms. You leverage end-to-end technical expertise and telemetry analysis to identify patterns and opportunities to implement configuration and data changes. You review the effect of changes to documents and share development insights within your team. You drive code/design reviews, host regular meetings, and participate in on-call rotations and incident responses throughout product development and operations cycles. In addition, you respond to incidents during regular on-call rotations and share details related to incidents and their resolution through post-mortem reports and regular review meetings. Other Embody our culture and values

Keywords: , Olympia , Senior Site Reliability Engineer - CTJ - Poly, IT / Software / Systems , Redmond, Washington

Didn't find what you're looking for? Search again!

Let Redmond recruiters find you. Post your resume for free!

Get Redmond IT / Software / Systems jobs via email.

View more Olympia IT / Software / Systems jobs

Other IT / Software / Systems Jobs

Office Administration / Word Processing - F/T Direct Hire Position
Description: Our company is committed to understanding how our business practices affect the social, environmental, and economic facets of our community. We are doing work that helps change the economic, environmental (more...)
Company:
Location: Seattle
Posted on: 06/24/2025

Legal Administrative Specialist
Description: Our client is extremely motivated in hiring a legal secretary with insurance defense background not workers comp . Construction Defect background that would be a plus. br br Complete Description: (more...)
Company:
Location: Seattle
Posted on: 06/24/2025

Earn Cash From Taking Surveys Online
Description: Looking for people to participate in taking online surveys for Fortune 500 brands. If you are a self-starter , looking for flexible hours throughout the week, this may be for you Earn up to 25 per survey. (more...)
Company:
Location: Gig Harbor
Posted on: 06/24/2025

Salary in Olympia, Washington Area | More details for Olympia, Washington Jobs |Salary

Senior Executive Assistant - High Profile
Description: Key Responsibilities: br Provides highly advanced administrative support to Executive Vice President or above. br Organizes and prioritizes phone calls, inquiries and complex or high volume requests (more...)
Company:
Location: Seattle
Posted on: 06/24/2025

Skilled and Effective Administrative Assistant Needed
Description: We work with the top companies in Seattle to help fill their internal positions with great talent. If you are an experienced Administrative Assistant, we are looking for you We need someone with strong (more...)
Company:
Location: Seattle
Posted on: 06/24/2025

Operations Center Agent
Description: Adecco is assisting a local client in recruiting for a current Operations Center Agent - Customer Service job in Seattle, WA. This is for a long-term temporary opportunity. As a Customer Service Representative (more...)
Company:
Location: Seattle
Posted on: 06/24/2025

Office Administration Seattle - F/T Direct Hire Position
Description: Our company is committed to understanding how our business practices affect the social, environmental, and economic facets of our community. We are doing work that helps change the economic, environmental (more...)
Company:
Location: Seattle
Posted on: 06/24/2025

Receptionist
Description: Adecco is assisting a local client recruiting for a current Receptionist job in Seattle, WA. This is a long-term temporary opportunity. As a receptionist you will answer inquiries and provide information (more...)
Company:
Location: Seattle
Posted on: 06/24/2025

? [Apply in 3 Minutes] Travel ARRT-Certified MRI Technologist
Description: Job Description Nurse First is seeking a travel MRI Technologist for a travel job in Tacoma, Washington. Job Description amp Requirements - Specialty: MRI Technologist - Discipline: Allied Health Professional (more...)
Company:
Location: Tacoma
Posted on: 06/24/2025

Administrative Assistant - D/T with Benefits
Description: Due to promotion, we are looking for the new face of our company. As the Director of First Impressions you will act as the first point of contact to team members and visitors. Do you have the gift of (more...)
Company:
Location: Seattle
Posted on: 06/24/2025

Loading more jobs...

Senior Site Reliability Engineer - CTJ - Poly

Didn't find what you're looking for? Search again!

Other IT / Software / Systems Jobs

Log In or Create An Account