Location & work modality: EMEA (remote)
Start: ASAP
Type of Contract: Permanent, full-time
About Radian Arc
Radian Arc, now part of InferX, Submer's AI cloud and GPU infrastructure platform, provides an infrastructure-as-a-service (IaaS) platform for running cloud gaming, artificial intelligence and machine learning applications inside telecommunication carrier networks. Our teams across the USA, Australia, Central Europe, Malaysia, Singapore and Japan offer telecom operators a GPU-based edge computing platform without the need for capital expenditure, facilitating low latency and improved economics for value-added services and the monetization of 5G investments.
What impact you will have
Mission: Design and build the observability platform that powers visibility, reliability, and performance insights for large-scale GPU cloud infrastructure as well as smaller edge deployments.
This role is responsible for designing and implementing key parts of the observability architecture across the platform, enabling engineering, operations, and customers to understand system behavior in real time across distributed AI workloads, GPU clusters, networking fabrics, storage systems, and edge inference environments.
You will design and operate low-latency, high-scale telemetry pipelines that collect, process, and analyze metrics, logs, and traces from infrastructure running across core datacenter clusters and smaller edge deployments. The platform you build will support internal operations, automated reliability mechanisms, and customer-facing observability experiences.
As a senior engineer, you will lead delivery of major observability initiatives, contribute to the evolution of telemetry standards and SLO implementation, and work with other teams to ensure observability is effectively integrated into the platform architecture from infrastructure to application layers.
You will collaborate closely with infrastructure, networking, storage, and platform engineering teams to provide clear visibility into performance bottlenecks, infrastructure degradation, and distributed workload behavior across both hyperscale GPU environments and smaller edge installations.
This role contributes directly to improving platform reliability by analyzing production telemetry, identifying systemic issues, and driving improvements in performance, efficiency, and operational stability across the stack.
Observability Platform Architecture
Contribute to observability standards across services, including metrics, tracing instrumentation, logging, and SLO implementation.
Infrastructure and Platform Observability
Customer-Facing Observability
Network and Infrastructure Telemetry
Technical Stack: Observability and telemetry technologies used across the platform include:
Observability Framework
Hardware and Infrastructure Telemetry
Experience building metrics, logging, tracing, alerting, and dashboards at production scale.
Networking and Infrastructure Telemetry
What we offer
Attractive compensation package reflecting your expertise and experience.
A great work environment characterised by friendliness, international diversity, flexibility, and a hybrid-friendly approach.
You'll be part of a fast-growing scale-up with a mission to make a positive impact, offering an exciting career evolution.
Our job titles may span more than one job level. The actual base pay is dependent on a number of factors, such as transferable skills, work experience, business needs and market demands.
Our Inclusive Responsibility
Radian Arc is committed to creating a diverse and inclusive environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, veteran status, or any other protected category under applicable law.