This page last changed on Sep 15, 2011 by imamagic.

Release: Update-07

Summary

Start Date 25 Oct 2010
End Date 26 Nov 2010
Status Released
Release Date 30 Nov 2010
Release Manager James C
Main Activities
  • ATP for NCG Topology
  • Test ACE
  • CA Distribution probe update
  • ARC probes integration
  • Misc NCG + Probe patches
Notes Update-07 is the first release which fully supports using the ATP as a topology provider instead of SAM. This is a major step forward since it makes the configuration via NCG and the display via MyEGI to be syncronized.

Validation Steps performed

List of new metrics

CE & CREAM-CE:

Nagios:

  • hr.srce.CADist-GetFiles - more info
  • hr.srce.CADist-Check - more info
  • org.nagiosexchange.LogFiles - metric checks /var/log/messages for messages coming from SAM components
  • hr.srce.GoodSEs - see JOBSUBMIT_WN_SE_REP_FILE related release note below
  • org.egee.MrsCheckMissingProbes - more info

ARC-CE (profile arc):

  • org.arc.GRIDFTP
  • org.arc.LFC
  • org.arc.RLS
  • org.arc.SRM
  • org.arc.Jobsubmit
  • org.arc.python
  • org.arc.perl
  • org.arc.gcc
  • org.arc.csh
  • org.arc.AUTH
  • org.arc.CA-VERSION
  • org.arc.SW-VERSION
  • org.arc.ARC-STATUS

List of packages updated in this release

atp-1.15.6-3.el5
atp-web-1.15.6-3.el5
egee-NAGIOS-1.0.0-56.el5
egee-NRPE-1.0.0-18.el5
glite-yaim-nagios-1.0.121-1.el5
grid-monitoring-config-gen-0.75.2-3.el5
grid-monitoring-fm-nagios-remote-0.19.1-1.el5
grid-monitoring-org.activemq-probes-0.8-3.el5
grid-monitoring-org.ggus-probes-0.6-1.el5
grid-monitoring-org.nagiosexchange-probes-0.11-1.el5
grid-monitoring-probes-cadist-0.1.1-1.el5
grid-monitoring-probes-ch.cern-0.20.1-1.el5
grid-monitoring-probes-hr.srce-0.30.2-1.el5
grid-monitoring-probes-org.ndgf-0.3-1.el5
grid-monitoring-probes-org.sam-0.1.18-1.el5
gstat-validation-2.0.41-1.el5
jmx4perl-0.73-1.el5
myegi-0.2.8-2.el5
nagios-gocdb-downtime-0.21-1.el5
nagios2metricstore-1.0.29-3.el5
poem-0.1-1.el5
poem-sync-0.1-1.el5
python-GridMon-1.1.12-1.el5
voms2htpasswd-1.7-1.el5

Configuration Changes

Update-07 is the first release which fully supports using the ATP as a topology provider instead of SAM. This is a major step forward since it makes the configuration via NCG and the display via MyEGI to be syncronized. We recommend all ROC/NGIs to enable ATP configuration. To enable, set the following variables for YAIM in site-info.def
NCG_TOPOLOGY_USE_SAM=false
NCG_TOPOLOGY_USE_ATP=true
NCG_TOPOLOGY_ATP_ROOT_URL="http://grid-monitoring.cern.ch/atp"

There is now support for monitoring nagios and probe errors on the nagios host (SAM-896@jira). This is carried out by looking at the /var/log/messages file. In order for metric to work properly the following Yaim variable has to be set:

NAGIOS_SUDO_ENABLE_CONFIG=true

Release Notes

  • Starting from this release, it is possible to specify more than one replication SE for WN replica test org.sam.WN-RepRep. Static and/or dynamic mechanisms are possible. JOBSUBMIT_WN_SE_REP can now be defined with a list of comma-separated hostnames; this provides a static mechanism for defining replication SEs. New JOBSUBMIT_WN_SE_REP_FILE variable, if specified, should be a file name (w/o path, which is dynamically generated by respective metrics based on VO and/or FQAN for which the metrics are defined) that will be filled in with a list of SEs defined on the Nagios instance that recently successfully passed org.sam.SRM-All set of tests. This triggers execution of local hr.srce.GoodSEs check to generate the list of "good" SEs, as well as provides the file as input parameter to org.sam.{CREAM}CE-JobState metric(s). The latter takes up to max 3 hosts from the file and, if JOBSUBMIT_WN_SE_REP was defined, appends them to the static list. On WN, org.sam.WN-RepRep tries to replicate to all the SEs in the provided order until the replication succeeds. The metric returns CRITICAL, if file couldn't be replicated to any for the SEs. This fixes https://tomtools.cern.ch/jira/browse/SAM-442
  • ARC probes are integrated with SAM. Additional actions required for ARC probes configuration are described here.

Known Issues

We have detected two issues using ATP as a topology provider instead of SAM.

All Central-LFC services are also being mapped to OPS VO. This bug is being tracked under JIRA ticket SAM-1003
The second issue is that ATP does not contain yet metadata information for CEs and in particular, discovery of MPI flavours. This means that you cannot currently configure Nagios to test MPI if you are using ATP as topology provider for NCG. For this, we recommend you to keep on using SAM, through the variables for YAIM in site-info.def
NCG_TOPOLOGY_USE_SAM=true
NCG_TOPOLOGY_USE_ATP=false

To configure Nagios, NCG needs to contact GOCDB. If GOCDB is down, you will see these error logs:

opt/glite/yaim/bin/yaim -s /root/yaim/site-info.def -c -n glite-UI -n glite-NAGIOS

results in this error:
...
INFO: Configuring for Profiles : ROC
INFO: Configuring ncg cronjob
INFO: Generating nagios configation from NCG (This might take a few minutes)...
Running ncg: FAILED
Enabling ncg cron: [ OK ]
ERROR: ncg service failed
ERROR: Configuration error !

/var/log/ncg.log contains:

Mon Dec 13 10:21:07 MSK 2010 : ERROR: Could not get results from GOCDB: 500 SSL negotiation failed:
Mon Dec 13 10:21:07 MSK 2010 : mv: cannot stat `/etc/nagios/wlcg.d.ncg.backup': No such file or directory

List of Issues fixed in this release

jiraissues: Unable to determine if sort should be enabled.

Metrics for this release

gadget: Error rendering gadget [ https://tomtools.cern.ch/jira/rest/gadgets/1.0/g/com.atlassian.jira.gadgets:pie-chart-gadget/gadgets/piechart-gadget.xml ]

projectOrFilterId=filter-10062&statType=issuetype&isConfigured=true&refresh=false

gadget: Error rendering gadget [ https://tomtools.cern.ch/jira/rest/gadgets/1.0/g/com.atlassian.jira.gadgets:average-age-chart-gadget/gadgets/average-age-gadget.xml ]

projectOrFilterId=filter-10062&periodName=daily&daysprevious=90&isConfigured=true&refresh=false

gadget: Error rendering gadget [ https://tomtools.cern.ch/jira/rest/gadgets/1.0/g/com.atlassian.jira.gadgets:pie-chart-gadget/gadgets/piechart-gadget.xml ]

projectOrFilterId=filter-10062&statType=components&isConfigured=true&refresh=false

gadget: Error rendering gadget [ https://tomtools.cern.ch/jira/rest/gadgets/1.0/g/com.atlassian.jira.gadgets:created-vs-resolved-issues-chart-gadget/gadgets/createdvsresolved-gadget.xml ]

projectOrFilterId=filter-10062&periodName=daily&daysprevious=30&isCumulative=true&showUnresolvedTrend=false&versionLabel=major&isConfigured=true&refresh=false

Document generated by Confluence on Feb 27, 2014 10:19