What are the responsibilities and job description for the Monitoring Tool Expert position at Goldenpick Technologies LLC?
Responsibilities
- Platform Ownership
- Network & Monitoring Tools (must have)
- Familiar with tools such as SolarWinds (including NetPath). As a platform owner, ensure platform stability, upgrades, patching, and day-to-day support.
- Knows network-centric monitoring capabilities, including SNMP polling, traps, and device visibility. Ensure new sites and devices are properly onboarded
- Partner with platform and cloud teams to ensure migrated workloads meet monitoring standards. Systems Administration (must have)
- Provide sysadmin support for Linux and Windows servers, including:
- Agent deployment and upgrades (SolarWinds, Datadog, Dynatrace)
- OS level troubleshooting and configuration
- Monitoring and logging enablement
- Support hybrid environments spanning on-prem and Azure infrastructure.
- A developer mindset with experience in Dev workflow, GitHub, PowerShell, etc.
- Observability & Event Management Support (should have)
- Has experience with tools such as Datadog and Dynatrace. The person will be responsible for collaborating with platform owners to support integrations, data quality, and alerting hygiene.
- Assist with event management workflows, ensuring alerts are actionable and routed correctly.
- Participate in efforts to reduce alert noise and repeat incidents. SIEM & Security Visibility (nice to have)
- Develop a working understanding of SIEM concepts and platforms such as Azure Sentinel and CRIBL.
- Support log ingestion, troubleshooting, and collaboration with security and incident response teams.
- Ensure infrastructure and network telemetry support security detection requirements. Cloud Monitoring & Azure Integration (should have)
- Has experience with the Azure cloud platform. Have either directly supported or is familiar with Azure-based monitoring and logging, including:
- Azure Monitor and Log Analytics integrations
- Observability for Azure-hosted workloads, Automation, AI & Continuous Improvement (nice to have)
- Explore and apply AI-assisted features within monitoring, event management, and SIEM tools to:
- Improve signal quality / reduce alert fatigue
- Support faster incident triage
- Contribute to documentation, runbooks, and operational improvements focused on small, incremental wins.
- Knowledge Transfer & Operational Resilience
- Participate in knowledge transfer activities related to platform transitions and retirements. Maintain documentation.
- Support on call or escalation rotations as needed.
Must have
- Minimum 4-5 years of experience in infrastructure operations, monitoring, observability, or platform operations roles, supporting enterprise environments
- Hands-on experience with systems administration for Linux and Windows servers, including troubleshooting, configuration, and deployment of monitoring or management agents (e.g., SolarWinds, Datadog, Dynatrace).
- Foundational networking knowledge, including concepts such as SNMP, network monitoring, LAN/WAN fundamentals, firewalls, and telemetry collection, sufficient to support network-centric monitoring platforms like SolarWinds
- Not a must, but nice to have experience with a platform like StruxureWare.
- Experience with observability or monitoring platforms, such as SolarWinds, Datadog, Dynatrace, or similar tools, with an understanding of alerting, dashboards, and signal quality.
- Exposure to cloud environments, preferably Microsoft Azure, including familiarity with monitoring and logging concepts (e.g., cloud-based telemetry, logs, metrics, and integrations).
- Basic understanding of incident and event management practices, including alert triage, escalation, and collaboration with incident response or operations teams.
- Demonstrated willingness and ability to learn new technologies quickly, with examples of picking up new platforms, tools, or domains outside of prior core expertise.
- Familiarity with Agile or SAFe ways of working, including collaboration in sprint-based delivery models and cross-functional team engagement, is a plus.
- Strong communication and collaboration skills, with the ability to work effectively with platform owners, operations teams, security teams, and external stakeholders.
- Experience working in a modern Dev workflow using GitHub (branches, pull requests, code reviews, and CI/CD) to manage and deploy scripts/automation used for platform operations
- Working proficiency in scripting languages such as PowerShell, Python, BASH, or similar scripting languages.
- Knowledge of Azure, Azure Active Directory (AD), and hybrid cloud environments is a plus.
- Exposure to SIEM concepts or platforms such as Azure Sentinel, CRIBL, or similar is a plus.
- Experience with change management practices in an enterprise IT environment is beneficial
Salary : $60 - $65