Remora – Resource Monitoring for Users

Summary

HPC admins are always looking for better ways to monitor the systems for which they are responsible by understanding how the hardware is operating and seeing how user applications are performing. Many tools and techniques – both hardware and software – are available to coordinate monitoring with resource managers (job schedulers), all of which are administrator-oriented tools.

Users have precious few tools to monitor the resources their applications are using. With “application telemetry” information, users can understand the pattern of their application, whether it seems to be performing correctly or incorrectly, what resources it consumed, and how their application is balanced across several nodes in the system – or even a single node.

Remora from TACC can gather this information for you and create plots to help guide you to a better understanding of your application without affecting its performance. Typically, the system administrator installs Remora, but users can install it in their accounts, as well.

Tuning the Remora installation is possible, particularly around what is monitored. Once installed, you just put the command remora before the command that runs the application, and you start gathering information. A few environment variables adjust how Remora gathers the data, but for the most part, it just silently gathers the data for you.

Remora is a great tool for users who want an idea of their application resource usage. Not pure profiling, Remora is really a combination of profiling and system monitoring. Remora is easy to install and fairly light on resource usage and can be a great help to users.

Related content

  • REMORA

    Remora combines profiling and system monitoring to help you get to the root of application problems by revealing its use of resources.

  • Resource monitoring for remote applications
    Remora combines profiling and system monitoring to help you get to the root of application problems by revealing its use of resources.
  • HPC resource monitoring for users
    Remora provides per-node and per-job resource utilization data that can be used to understand how an application performs on the system through a combination of profiling and system monitoring.
  • Fishing with Remora

    Timing techniques and system-level monitoring tools like Remora provide  insight into application behavior and how applications interact with system resources.

  • Monitoring Tools for Admins

    Monitoring tools can help you better understand your applications.

comments powered by Disqus