This page last changed on Apr 26, 2013 by roveznav.

Summary

Start Date 29 October 2012
Release Date 07 December 2012
Status Released
Validation Steps SAM-3071
Validated 06 March 2013

Description

This release was mainly focused on the introduction of SAM Operational Tools Monitoring. In addition we worked on bug fixing identified during the wide deployment of SAM Update 19.

Technical details:

  • 70 tickets resolved
  • Topology aggregation:
    • Added sanity check to compare differences between services declared in local ATP with central ATP
    • Enabled VOMS CSRF support
    • Fixed problem of invalid json output provided by ATP PI, which was affecting NCG
  • Profile Management:
    • Unit test moved to Django 1.3
  • MyWLCG changes:
    • MyWLCG Error Handling improved
    • Status view improved
    • Added monthly reports to the central MyEGI
    • Two new reports for T0/T1 sites
  • Nagios configuration:
    • Enabled defining contacts for services
    • ncg.localdb partially migrated to ncg-metric-config (json format)
    • Optimized Java truststore generation
    • Added option to give permissions to run Nagios commands to anybody with a valid D
  • Probes:
    • grid-monitoring-probes-ch.cern.sam:
      • MrsCheckDBInsertsDetailed probe improved
      • SamCheckUpdate probe improved
      • Added SAMCentralWebAPI probe
    • hr.srce.GridProxy-Get generates proxies with configurable lifetime
    • Probe libraries ported to SL6 (perl-TOM, python-GridMon and perl-GridMon)
    • Fixed mta-simple problem in grid-monitoring-probes-org.sam
  • MRS metrics disabled on SAM/Nagios nodes
  • SAM configuration changes (glite-yaim-nagios):
    • Consolidated/minimized number of httpd actions
    • Consolidation of YAIM variables (names uppercase)
    • OPS-MONITOR established as new SAM/Nagios configuration (for monitoring operational tools)
    • EGI report: switched Availability and Reliability labels
    • Decommission of MDDB
    • Improved MySQL database dump
    • Source code documentation
    • Removed dependencies on DAG repository

Package List

SAM-Nagios

 atp-1.26.4-1.el5
 glite-yaim-nagios-1.9.18-1.el5
 grid-monitoring-config-gen-0.91.3-1.el5
 grid-monitoring-probes-hr.srce-0.36.1-1.el5
 grid-monitoring-probes-org.sam-0.5.8-1.el5
 grid-monitoring-probes-ch.cern.sam-1.6.13-1.el5
 mrs-1.7.35-1.el5
 mywlcg-1.5.1-5.el5
 mywlcg-atp-api-1.26.1-1.el5
 mywlcg-atp-web-1.26.2-1.el5
 ncg-metric-config-1.2.3-1.el5
 perl-GridMon-1.0.62-1.el5
 python-GridMon-1.1.12-1.el5

SAM-Gridmon

ace-0.2.5-1.el5
 atp-1.26.4-1.el5
 mrs-1.7.35-1.el5
 mywlcg-1.5.1-5.el5
 mywlcg-atp-api-1.26.1-1.el5
 mywlcg-atp-web-1.26.2-1.el5
 glite-yaim-nagios-1.9.18-1.el5

Configuration Changes

Common

  • New Yaim configuration variables:
    Component Name Description Default Mandatory Example
    ATP MSG_DEST_OSG_DOWNTIME Messaging queue for OSG downtimes Yes Yes /topic/grid.management.downtime.RSV

SAM-Gridmon

  • New Yaim configuration variables
    Component Name Description Default Mandatory Example
    MyWLCG MYWLCG_TRENDS Enable/disable Trends on MyWLCG No No false

SAM-Nagios

  • New Yaim configuration variables:
    Component Name Description Default Mandatory Example
    nagios NAGIOS_ENABLE_ANY_DN Enable/disable of any DN to run Nagios commands on VO-nagioses Yes No false
    NCG CRO_BROKER_PASS BROKER_PASSWORD value for msg.cro-ngi.hr broker No No MyPass
    NCG EGI1_BROKER_PASS BROKER_PASSWORD value for egi-1.msg.cern.ch broker No No MyPass
    NCG EGI2_BROKER_PASS BROKER_PASSWORD value for egi-2.msg.cern.ch broker No No MyPass
    NCG GR_BROKER_PASS BROKER_PASSWORD value for broker.afroditi.hellasgrid.gr broker No No MyPass
    NCG NCG_SERVICE_NOTIFICATIONS_OPTIONS Nagios notification options for services No No u,c
    NCG NCG_HOST_NOTIFICATIONS_OPTIONS Nagios notification options for hosts No No 'cn'
    NCG NCG_PROXY_LIFETIME Defines the lifetime option to refresh_proxy probe in grid-monitoring-probes-hr.srce. If unset, probe will use default value 12. (Make sure that MyProxy server supports defined credential lifetime, otherwise the probe will fail) No No 12

Localdb changes

  • Migration of localdb metric configuration to JSON config
    JSON config files stored in /etc/ncg-metric-config.d/ must use .conf suffix. Otherwise NCG will ignore them.

Modification and overriding of global values NCG localdb format:

# add/modify configuration parameter
  MODIFY_METRIC_CONFIG!metric!config!value
  # add/modify dependency
  MODIFY_METRIC_DEPENDENCY!metric!dep!value
  # add/modify attribute
  MODIFY_METRIC_ATTRIBUTE!metric!attr!value
  # add/modify parameter
  MODIFY_METRIC_PARAMETER!metric!param!value
  # add/modify flag
  MODIFY_METRIC_FLAG!metric!flag!value

should be done by using JSON configuration files in directory /etc/ncg-metric-config.d/:

# cat /etc/ncg-metric-config.d/modify.conf
{
   "metric" : {
      "attribute" : {
         "attr" : "value"
      },
      "parameter" : {
         "param" : "value"
      },
      "flags" : {
         "flag" : "value"
      },
      "config" : {
         "config" : "value"
      },
      "dependency" : {
         "dep" : 1
      },
   }
}
  • Enable contacts for service flavour on a given host
    Use NCG localdb to add/enable contacts for all metrics associated to a given service flavour:
    ADD_SERVICEFLAVOURCONTACT!host!ServiceFlavour!email@email.com
    # enables contact even if NCG_ENABLE_NOTIFICATIONS is 0
    ENABLE_SERVICEFLAVOURCONTACT!host!ServiceFlavour!email@email.com

Known Issues

The packages sam-nagios-1.20.0-1.el5.noarch.rpm and sam-release-1.20.0-1.el5.noarch.rpm are not in the EGI repository (therefore setting a wrong version in /etc/sam-release for those using the EGI repo).
For machines running latest version of glite-UI (3.2.10-1 or higher):
Please restart Nagios after yaim execution. Otherwise you may see problems similar to SAM-1693.
service nagios restart
Upgrading a node with yum requires a package exclusion, e.g.:
  1. on sam-nagios
    yum update --exclude sam-gridmon
  2. on sam-gridmon
    yum update --exclude sam-nagios
Metrics ch.cern.sam.MrsCheckDBInserts and ch.cern.sam.MrsCheckDBInsertsDetailed have to be disabled manually.
Please add following lines to file /etc/ncg/ncg.localdb
REMOVE_METRIC!ch.cern.sam.MrsCheckDBInserts
REMOVE_METRIC!ch.cern.sam.MrsCheckDBInsertsDetailed

Tickets List

jiraissues: Unable to determine if sort should be enabled.
Document generated by Confluence on Feb 27, 2014 10:19