Senior DevOps Engineer

On-site, Remote, Hybrid
- Tallinn, Harjumaa, Estonia

Own CI/CD and infrastructure, improve reliability and automation for a high-load SaaS platform handling time-critical communication globally.

Job description

Stack: Linux, Kubernetes (Helm), PostgreSQL, MongoDB, CI/CD(Gitlab CI), Cloud infrastructure (OVH, bare-metal, OpenStack, Cloudflare, AWS), IaC (Terraform, Ansible)

Company description

Textmagic AS is a publicly traded SaaS company listed on Nasdaq First North Tallinn. Our core product is a business messaging platform that enables companies to send A2P SMS, email, and build automated communication flows. Our customers use the platform for urgent alerts, compliant notifications, and other time-critical communication. Trusted by over 25,000 businesses worldwide, our system processes high-volume traffic while meeting strict uptime and performance requirements.

Our team of 40+ professionals is distributed across Estonia (headquarters), Romania, Ukraine, Serbia, and Montenegro. We work in a remote-friendly setup with high ownership and clear accountability. You will join a focused engineering team responsible for maintaining and scaling production systems used daily by thousands of businesses worldwide.

Role overview

We’re looking for a Senior DevOps Engineer to take strong ownership of the infrastructure behind our global SaaS messaging platform. This role is for someone who wants to shape how infrastructure is built, operated, and improved. You will be responsible for reliability, scalability, automation, and production stability across all environments.

You will work closely with engineering leadership and developers to improve system architecture, deployment processes, security, and operational standards. This is a high-impact role with real influence on technical decisions and how our infrastructure evolves as we grow.

Job requirements

Key responsibilities

CI/CD ownership: Design and own CI/CD pipelines (Gitlab), improving build speed, deployment safety, and rollback processes across environments
Infrastructure ownership: Own and evolve our cloud and bare-metal infrastructure (OVH, Cloudflare, AWS, OpenStack), ensuring high availability, performance, and stability under load
Infrastructure as code: Lead infrastructure as code practices using Terraform and Ansible, enforcing version control, peer review, and consistency standards
Observability and monitoring: Improve system observability using monitoring, logging, tracing, and alerting tools (Grafana, Prometheus, Loki), and drive proactive reliability improvements
Infrastructure security: Strengthen infrastructure security, including DDoS mitigation, traffic filtering, and access control management
Incident management: Lead root cause analysis of production incidents and implement long-term reliability improvements
Automation: Design automation to reduce manual operational work and improve deployment and recovery processes
Database reliability: Ensure high availability and performance of production databases (PostgreSQL, MongoDB), including backup, recovery, and scaling strategies
Environment management: Ensure consistency and reliability across development, staging, and production environments

Expected qualifications

Linux expertise: Strong Linux system administration experience in high-availability production environments
Kubernetes production experience: Hands-on experience running Kubernetes in production, including scaling, upgrades, and troubleshooting
Systems architecture understanding: Solid understanding of containerization, virtualization, and infrastructure design trade-offs
Networking knowledge: Strong understanding of networking concepts (L2, L4, L7), debugging tools (tcpdump, ngrep), and traffic analysis
Production lifecycle experience: Experience operating and troubleshooting applications in high-availability production environments
CI/CD systems design: Experience designing and maintaining CI/CD systems and deployment workflows
Database operations: Strong experience managing PostgreSQL and MongoDB in production, including performance tuning and reliability
Infrastructure as code: Practical experience with Terraform and configuration management tools (Ansible or similar), following best practices
Monitoring and logging: Experience working with monitoring and log aggregation systems (Grafana, Prometheus, Loki, or similar)
Security awareness: Practical understanding of infrastructure security principles and production hardening
Communication skills: Fluent written English and fluent spoken Russian required

Nice to have

Messaging/telecom background: Experience with telecom or messaging systems (SMPP, Asterisk, Kamailio)
PostgreSQL high availability: Experience with PostgreSQL replication/clustering, backups, and failover (PITR, Patroni/repmgr or similar)
Kubernetes operations: Experience operating Kubernetes clusters in production (upgrades, autoscaling, networking, troubleshooting)
Scripting: Scripting skills in Bash, Python, or Go for automation and internal tooling
Security and traffic protection: Experience mitigating malicious traffic and managing DDoS protection (Cloudflare WAF/rate limiting, fail2ban)
Email deliverability basics: Familiarity with SPF, DKIM, DMARC, and how they affect sending reliability
SRE practices: Experience with SLOs/SLIs, alert quality, and incident postmortems

What we offer

Competitive compensation: Salary aligned with senior-level responsibility
Technical ownership: Real influence on infrastructure decisions and long-term architecture
Production impact: Direct responsibility for high-availability systems used globally
Lean structure: Small team, fast decisions, and minimal management overhead
Tooling freedom: Ability to improve processes, automation, and infrastructure standards
Remote flexibility: Work remotely or from our office — your choice
Professional growth: Support for relevant training and technical development
Professional culture: High ownership, clear accountability, and direct communication

Why join us?

At Textmagic, infrastructure reliability is critical to the product. Our customers rely on us for time-sensitive communication, which means uptime and performance are not optional.

As a Senior DevOps Engineer, you will work on production systems that operate under real load and strict reliability requirements. You will have the authority to improve how infrastructure is designed, deployed, and maintained, and your decisions will directly affect system stability and performance.

You will join a focused engineering team where ownership is expected, and technical decisions matter.