Software Development Engineer, EC2 Instance Networking Job at Amazon Development Center U.S., Inc., Sunnyvale, CA

eXM1MnhIVkhZZ0piYVhpRzFVMXhNdmNBV0E9PQ==
  • Amazon Development Center U.S., Inc.
  • Sunnyvale, CA

Job Description

DESCRIPTION

Join our team building the scale-out networking backbone that powers the world's largest AI training clusters. We're developing high-performance RDMA and RoCE solutions that enable distributed training of trillion-parameter models across thousands of compute nodes on AWS infrastructure.

Our team is responsible for creating the networking software that connects massive AI accelerator clusters, focusing on SmartNIC integration, collective communication optimization, and ultra-high-bandwidth inter-rack connectivity. You'll be working at the intersection of cloud infrastructure and state-of-the-art AI hardware to solve some of the most challenging networking problems in distributed computing.

Key job responsibilities
Your responsibilities will include:
* Design and develop high-performance networking software solutions utilizing RDMA and RoCE technologies for large-scale AI clusters
* Integrate SmartNIC acceleration hardware with EC2 control plane systems and APIs
* Implement and optimize collective communication patterns for distributed AI training workloads
* Develop comprehensive performance monitoring, metrics collection, and benchmarking tools for high-bandwidth cluster interconnects
* Create automated testing frameworks and stress testing tools for multi-rack distributed systems
* Debug complex system-level issues across hardware acceleration, kernel networking, and distributed applications
* Collaborate on architecture decisions for next-generation scale-out AI infrastructure
* Participate in design reviews, code reviews, and technical documentation

About the team
Utility Computing (UC)
AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for customers who require specialized security solutions for their cloud services.
Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.
Diverse Experiences
AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.
About AWS
Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.
Inclusive Team Culture
Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (diversity) conferences, inspire us to never stop embracing our uniqueness.
Work/Life Balance
We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.
Mentorship & Career Growth
We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.

BASIC QUALIFICATIONS

- 3+ years of non-internship professional software development experience
- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Strong programming skills in C/C++ with focus on high-performance systems
- Experience with RDMA technologies and RoCE implementations
- Familiarity with collective communication libraries (NCCL, RCCL, OneCCL, MPI)
- Experience with Linux networking, kernel development, and distributed systems
- Understanding of high-performance computing clusters and parallel programming

Job Tags

Full time,

Similar Jobs

Petagogy

Sales Associate Job at Petagogy

 ...Petagogy is a locally owned Pittsburgh Pet Store with five locations. We are seeking animal loving, outgoing applicants who are interested in growing with us. A background in pet retail is beneficial, but we are looking for a self starter to help pet parents find the right... 

St. Vincent de Paul CARES

Grants Writer Job at St. Vincent de Paul CARES

 ...by transforming lives in the Vincentian spirit of charity, justice, and mercy through interpersonal connectivity. Summary: The Grant Writer (GW) has a passion for the Mission and Values of the Society of St. Vincent de Paul CARES dba St. Vincent de Paul CARES, as well... 

AMSURG

Surgical Technician Job at AMSURG

 ...surgery centers nationwide. In partnership with physicians and health systems, the organization delivers high-quality care for patients...  ...commitment to you! Position Summary The full-time surgical tech will support the daily needs of Weston Outpatient Surgical Center... 

Safe Harbor Home Care

Private Duty Caregiver Job at Safe Harbor Home Care

 ...Flexible schedule Room for growth Competitive pay Supportive and friendly office staff Job Description Summary Private Duty Caregivers provide non-medical service and assistance to a client in their home and communities, who, because of advanced age or... 

Addison Group

HRIS Analyst Job at Addison Group

 ...Job Title: HRIS Analyst Location: Hybrid in Washington, DC Industry: Human Resources / Technology Pay: $38$40/hour, depending...  ...and reporting tools ~ Experience with platforms such as UKG, Workday, ADP, Ceridian, or similar systems ~ Advanced Excel skills...