- Design and develop the deep learning neural network training workflows with Cerebras high performance compute fabric system and software stack; training and benchmarking the convergence of deep learning neural networks used in applications such as Natural Language Processing (NLP) and Computer Vision.
- Architect the neural network integration and test strategy for various deep learning networks using TensorFlow framework.
- Design functional integration and testing of model training convergence/accuracy, training performance and model evaluation.
- Implement and validate efficient data pipeline optimizations for training the deep learning neural networks on Cerebras system.
- Design and develop a robust test framework and debug tools using Python and C/C++ to validate Cerebras software stack and enable the customer ready workflows with Cerebras high performance compute fabric.
- Design and develop the automated test suites to validate Cerebras software stack and performance benchmarking.
- Design and develop the test harness framework to validate the distributed data streaming from containerized platforms such as dockers and singularity.
- Develop test tools to detect the numeric instability and track the continuous integration using Jenkins and issue tracking using Jira.
- Design the solution infrastructure and test beds to replicate the customer deployments to train the deep learning neural networks using Cerebras compute fabric system.
- Design of high-performance networking backbone using 10Gbps and 100Gbps networking switches, networking cards and TCP offloading FPGA engine.
- Design of distributed systems infrastructure and storage backend, and related software components interacting with Cerebras systems.
- Work on Customer requirements, issues and provide feedback to Product Management and Engineering groups.
- Work with customers to gather requirements related to deep learning model, framework, dataset, computation kernels needed and expected training performance.
- Work on debugging customer issues and perform root cause analysis.
- Design reviews, test plan reviews, code reviews from peers within the company, customers and partners.
- Review designs from various hardware and software teams and provide feedback.
- Review customers model and data input pipelines to understand compatibility and interoperability with Cerebras software stack and Cerebras System.
- Documentation of designs, test plans, in-house issues and field issues.
- Document the design specification on product features, test framework specifications, test harness descriptions and in-house and customer issues in cloud shared platforms like jira, google docs, share point or dropbox.
- Source code versioning using Git.
A Bachelor’s degree or foreign equivalent degree in Computer Information Systems, Computer Software, Computer Science, or a related field and 5 years of post-baccalaureate progressive experience as a Senior Automation Engineer, Principal Software Engineer, Senior Principal Software Engineer, Member of the Technical Staff (Senior Software Engineer), Senior Software Engineer, Software Engineer, or a related occupation required.
The required work experience must include 5 years of experience with the following:
- Distributed storage systems and distributed storage file systems;
- Design and development of test automation frameworks and development infrastructure using Python and C/C++;
- UNIX/Linux operating system and kernel subsystems;
- VMware hyperconverged Infrastructure, Dockers, singularity and openstack orchestrators;
- Storage Area Network (SAN) technologies including iSCSI, SCSI, FC, and RAID; and
- Integration of workflows to train the deep learning neural networks for Natural Language Processing (NLP) and Computer Vision.
Employer’s name: Cerebras Systems Inc.
Job site : 1237 E Arques Avenue, Sunnyvale, CA 94085
If you are interested in applying for this position, please mail resume with Job# 105 to HR at Cerebras Systems Inc., 1237 E Arques Avenue, Sunnyvale, CA 94085