This page last changed on Jan 14, 2014 by mbabik.

Summary

Start Date 8 December 2012
Release Date 28 October 2013
Status Released
Validation Steps SAM-3251
Validation Status Validated

Description

This release is mainly focused on the integration of EMI probes. In addition it contains several bug fixes identified during the deployment of SAM Update 20.

The following probes were integrated:

  • ARC probes (nordugrid-arc-nagios-plugins-1.6.1-1.rc1.el5)
  • new ARGUS probes (nagios-plugins-argus-1.1.0-2.el5)
  • BDII probes (nagios-plugins-bdii-1.0.14-1.el5)
  • CREAMCE probes (emi-cream-nagios-1.0.1-4.el5.sam)
  • FTS probes (nagios-plugins-fts-1.0.1-1.el5)
  • GLEXEC probe (nagios-plugins-emi.glexec-0.3.0-1.sl5)
  • LFC probes (nagios-plugins-lfc-0.9.5-1.el5)
  • new MPI probes (egi-mpi-nagios-0.0.5-1.el5)
  • SRM probes (emi.dcache.srm-probes-1.0.0-1.el5)
  • UNICORE probes (unicore-nagios-plugins-2.2.1-1.sl5)
  • WMS probes (emi-wms-nagios-3.5.0-3.sl5)
  • WN replication probes (nagios-plugins-wn-rep-1.0.0-1.sl5)

The full list of metric changes, is available at: SAM Doc FAQs

Installation and Configuration

SAM-Nagios

The new installation guide is available at: New SAM-Nagios install guide

An upgrade from previous SAM versions is not possible for this release. We strongly recommended to install SAM-Update 22 starting with a base operating system.

The database backup is no longer performed automatically as part of yaim. The following yaim function can be executed manually to create a backup of your database:

/opt/glite/yaim/bin/yaim -r -d 6 -s /etc/yaim/site-info.def -n SAM_NAGIOS -f config_mysql_backup

Added YAIM variables in this release:

Component Name Description Default Mandatory Example
DB DB_TMP_DIR tmp directory for MySQL Yes No "/var/tmp"

Removed YAIM variables in this release:

ENABLE_ARC_PROBES
In order to support transparent migration of SAM to CNRS please add the following yaim variables to your site-info:
ATP_ROOT_URL="http://mon.egi.eu/atp"
POEM_SYNC_URLS="http://mon.egi.eu/poem/api/0.1/json/"
In order for ARC SRM and LFC tests to work the following needs to be done:
1. Yaim variable JOBSUBMIT_WN_SE_REP_FILE must be set to file where hr.srce.GoodSEs will store list of working SRM endpoints, e.g.
JOBSUBMIT_WN_SE_REP_FILE=GOOD_SES
2. Global attribute LFC_HOST must be set to LFC in localdb, e.g.
GLOBAL_ATTRIBUTE!LFC_HOST!prod-lfc-shared-central.cern.ch

ARC SRM tests require additional configuration in file /etc/nagios/plugins/arcnagios-local.ini. Since the directory is not consistent on all SEs, admins must manually define directory for each SE that might be used. Details can be found in Switch section of documentation: http://git.nbi.ku.dk/downloads/NorduGridARCNagiosPlugins/arcce.html#custom-substitutions-in-job-test-sections. Alternatively, se_host and se_test_dir can be used to define a single SE for ARC SRM tests.

SAM-Gridmon

The new installation guide is available at: New SAM-Gridmon install guide

The database deployment is not performed automatically as part of yaim. The following yaim function should be executed manually:

/opt/glite/yaim/bin/yaim -r -s /etc/lcg-quattor-site-info.def -n sam_gridmon -f config_database

Added YAIM variables in this release:

Component Name Description Default Mandatory Example
MyWLCG MYWLCG_REPORT_VO_ALL_SITES_PROFILES Profiles to be used on VO All sites report Yes No "atlas_critical cms_critical"

Known Issues

  1. For NGIs monitoring ARC or using ARC probes, there is a missing dependency that needs to be installed manually:
    $ yum install nordugrid-arc-plugins-globus
  2. If you use the latest CentOS 5.10 (or SL5.10) be aware that the base now contains mysql51 packages. Since the base has higher priority than sam repo please modify the exclude for base and updates accordingly:
    [base]
    priority=2
    protect=1
    exclude = perl-DBI mysql51*
    
    [updates]
    priority=2
    protect=1
    exclude = perl-DBI mysql51*
  3. Please apply the following patch atp_service_type_update.patch* in case after restoring database from backup, yaim fails with :
    INFO: Creating database schema
            Existing DB schema and versions:
    
                    - atp is currently 1.19
                    - metricstore is currently 1.17
                    - mddb is currently 1.1
                    - poem_sync is currently 1.3
            Upgrading atp DB to version 1.20
    ERROR: deploy_dbschema.pl failed, check /var/log/sam-db.log.
       ERROR: Configuration error !

    (*credits to Jan Astalos for the fix)

  4. eu.egi.mpi.complexjob.CREAMCE-JobState-/ops/Role=lcgadmin fails with error: SMPGranularity and HostNumber are mutually exclusive when WholeNodes allocation is not requested: wrong combination of values (more information at https://ggus.eu/ws/ticket_info.php?ticket=98851):
    In /usr/libexec/grid-monitoring/probes/eu.egi.mpi/complexjob/jdl.template 
    
    replace:
    HostNumber = 2;
    with:
    CPUNumber = 4;

Package List

Full list of Update-22 packages and dependencies is available at SAM Update-22 repository

SAM-Nagios changes

  • atp-1.27.13-1.el5
    - Removed usage of service types
    - Solved flapping problem of MPI service flavour
    - Improved service flavour synchronisation
    - Improved the event which removes services and sites in the local instances (now compatible with multiple regions)
    - Allowed VOs with more than one “.” in the name
    - Improved atp-sync output
    - Improved the downtime_update procedure
    - Fixed update mechanism for vofeed services belonging to multiple groups for the same physical site
    - Tested GOCDBv5 changes
    
  • glite-yaim-nagios-1.10.31-1.el5
    - glite-NAGIOS nodetype became sam_nagios
    - Improved services start/stop
    - Dependencies cleanup on msg-nagios-bridge
    - Removed backup of mrs-probes in ncg-localdb.d
    - Redesign of the database installations and upgrades
    - Consolidation of database backups
    - Database backup disabled by default
    - Added configurable tmp directory for MySQL
    - Improved database actions
    - Added grants in MySQL for DB_USER_W and DB_USER_R accounts
    
  • ncg-metric-config-1.3.13-1.el5
    Integration of EMI probes:
    - Integrated ARC probes
    - Integrated ARGUS probes
    - Integrated BDII probes
    - Integrated CREAMCE probes
    - Integrated FTS probes
    - Integrated GLEXEC probe
    - Integrated new JobMonit metrics
    - Mapping of org.sam.CE-JobSubmit
    - Integrated LFC probes
    - Integrated new MPI probes
    - Integrated emi.dcache SRM probes
    - Integrated UNICORE probes
    - Integrated WMS probes
    - Paths in ncg-metric-config replaced with emi-nagios ones
    
  • grid-monitoring-config-gen-0.93.6-1.el5
    - Integration of EMI probes
    - Enabled definition of passive metrics without parent
    - NCG dies when internal metric is not defined in ncg-metric-config
    
  • grid-monitoring-probes-hr.srce-0.37.0-1.el5
    - Added myproxy dependency
    
  • grid-monitoring-probes-ch.cern.sam-1.6.14-1.el5
    - CheckSamRelease probe improved
    - Improved SAMCentralWebAPI probe
    
  • grid-monitoring-probes-eu.egi.sec-1.0.10-2.el5
    - Added dependency on emi-cream-nagios
    
  • grid-monitoring-probes-cadist-0.5.0-1.el5
    - Added dependency on emi.creamce
    
  • mrs-1.7.43-1.el5
    - Improved automatic status recomputations
    - Improved status computations (taking FQANs into account)
    - Changed MySQL date handling
    - Added translation mechanism for metrics
    - Removed usage of service types
    
  • mywlcg-1.5.5-1.el5
    - Changed default landing page
    - Changed default SSL landing page
    - Removed xml, json, csv links (were broken)
    - Fixed incorrect grouping of service in MyEGI Status
    - Deleted downtimes are not considered for graphics generation
    - Solved problem with service availability graphs for flavours with “.” in the name
    - Improved status filter in Firefox browser
    - Fixed problem where status view lost data range (when coming from availability view)
    - Improved Data View filter in the Treemap view
    
  • mywlcg-atp-web-1.26.2-3.el5
    - Updated mywlcg release info import
    
  • perl-GridMon-1.0.73-1.el5
    - Modified GridMon::sgutils default GLOBUS location
    
  • poem-0.9.84-1.el5 and poem-sync-0.9.84-1.el5
    - Improved POEM configuration
    - Improved poem_sync configuration
    - Improved POEM synchronizer
    - Removed auto-tagging
    - Added poem diff script
    - Added new version of metricinstances API
    - Patched synchronizer to support metric renaming
    - Included new metric mapping configuration
    - Integration of EMI probes
    - Added backward compatible handler for api / expressions
    
  • python-GridMon-1.1.13-1.el5
    - Added nagios-submit configuration file
    
  • sam-db-1.0.19-1.el5
    - Redesign of the database installations and upgrades
    
  • sam-nagios-1.22.0-7.el5
    - Imported packages and added EMI probes dependencies
    - Added 3rd party on various nagios-plugins, updated EMI package versions
    
  • sam-release-1.22.0-1.el5
    - Build for Update-22
    

SAM-Gridmon changes

  • ace-1.2.6-1.el5
    - Decommission of service types
    
  • atp-1.27.13-1.el5
    - Removed usage of service types
    - Tested GOCDBv5 changes
    
  • glite-yaim-nagios-1.10.31-1.el5
    - Removed voms2httpasswd from sam-gridmon
    - glite-NAGIOS_WEB nodetype became sam_gridmon
    - Redesign of the database installations and upgrades
    
  • mrs-1.7.43-1.el5
    - Improved automatic status recomputations
    - Improved status computations (taking FQANs into account)
    - Added new dataloader algorithm
    - Added translation mechanism for metrics
    - Removed usage of service types
    - Decommissioned job METRICSTORE_PERFORMANCE_TUNING
    - Improved MRS view used by MyWLCG web service to get latest metric results in profile
    
  • msg-consume2db-1.0.23-1.el5
    - Fixed exception handling for messages containing multiple metrics
    
  • mywlcg-1.5.5-1.el5
    - Changed default landing page
    - Changed default SSL landing page
    - Removed xml, json, csv links (were broken)
    - Fixed incorrect grouping of service in MyEGI Status
    - Deleted downtimes are not considered for graphics generation
    - Solved problem with service availability graphs for flavours with “.” in the name
    - Improved status filter in Firefox browser
    - Fixed problem where status view lost data range (when coming from availability view)
    - Improved Data View filter in the Treemap view
    - Implemented new VO reports
    - Improved availability trends
    - Improved Availability and Reliability report
    - Improved Availability and Reliability reports generation
    - Improved Availability and Reliability reports Web Interface
    
  • mywlcg-atp-web-1.26.2-3.el5
    - Updated mywlcg release info import
    
  • poem-0.9.84-1.el5 and poem-sync-0.9.84-1.el5
    - Improved POEM configuration
    - Improved poem_sync configuration
    - Improved POEM synchronizer
    - Removed auto-tagging
    - Added poem diff script
    - Added new version of metricinstances API
    - Patched synchronizer to support metric renaming
    - Included new metric mapping configuration
    - Integration of EMI probes
    - Added backward compatible handler for api / expressions
    
  • sam-gridmon-1.22.0-2.el5
    - Added dependencies
    
  • sam-release-1.22.0-1.el5
    - Build for Update-22
    

Tickets List

jiraissues: Unable to determine if sort should be enabled.

Document generated by Confluence on Feb 27, 2014 10:19