This page last changed on Sep 15, 2011 by imamagic.
Release: Update-07
Summary
Start Date |
25 Oct 2010 |
End Date |
26 Nov 2010 |
Status |
Released |
Release Date |
30 Nov 2010 |
Release Manager |
James C |
Main Activities |
- ATP for NCG Topology
- Test ACE
- CA Distribution probe update
- ARC probes integration
- Misc NCG + Probe patches
|
Notes |
Update-07 is the first release which fully supports using the ATP as a topology provider instead of SAM. This is a major step forward since it makes the configuration via NCG and the display via MyEGI to be syncronized. |
Validation Steps performed
List of new metrics
CE & CREAM-CE:
Nagios:
- hr.srce.CADist-GetFiles - more info
- hr.srce.CADist-Check - more info
- org.nagiosexchange.LogFiles - metric checks /var/log/messages for messages coming from SAM components
- hr.srce.GoodSEs - see JOBSUBMIT_WN_SE_REP_FILE related release note below
- org.egee.MrsCheckMissingProbes - more info
ARC-CE (profile arc):
- org.arc.GRIDFTP
- org.arc.LFC
- org.arc.RLS
- org.arc.SRM
- org.arc.Jobsubmit
- org.arc.python
- org.arc.perl
- org.arc.gcc
- org.arc.csh
- org.arc.AUTH
- org.arc.CA-VERSION
- org.arc.SW-VERSION
- org.arc.ARC-STATUS
List of packages updated in this release
Configuration Changes
| Update-07 is the first release which fully supports using the ATP as a topology provider instead of SAM. This is a major step forward since it makes the configuration via NCG and the display via MyEGI to be syncronized. We recommend all ROC/NGIs to enable ATP configuration. To enable, set the following variables for YAIM in site-info.def
|
There is now support for monitoring nagios and probe errors on the nagios host (SAM-896@jira). This is carried out by looking at the /var/log/messages file. In order for metric to work properly the following Yaim variable has to be set:
Release Notes
- Starting from this release, it is possible to specify more than one replication SE for WN replica test org.sam.WN-RepRep. Static and/or dynamic mechanisms are possible. JOBSUBMIT_WN_SE_REP can now be defined with a list of comma-separated hostnames; this provides a static mechanism for defining replication SEs. New JOBSUBMIT_WN_SE_REP_FILE variable, if specified, should be a file name (w/o path, which is dynamically generated by respective metrics based on VO and/or FQAN for which the metrics are defined) that will be filled in with a list of SEs defined on the Nagios instance that recently successfully passed org.sam.SRM-All set of tests. This triggers execution of local hr.srce.GoodSEs check to generate the list of "good" SEs, as well as provides the file as input parameter to org.sam.{CREAM}CE-JobState metric(s). The latter takes up to max 3 hosts from the file and, if JOBSUBMIT_WN_SE_REP was defined, appends them to the static list. On WN, org.sam.WN-RepRep tries to replicate to all the SEs in the provided order until the replication succeeds. The metric returns CRITICAL, if file couldn't be replicated to any for the SEs. This fixes https://tomtools.cern.ch/jira/browse/SAM-442
- ARC probes are integrated with SAM. Additional actions required for ARC probes configuration are described here.
Known Issues
We have detected two issues using ATP as a topology provider instead of SAM.
| All Central-LFC services are also being mapped to OPS VO. This bug is being tracked under JIRA ticket SAM-1003 |
| The second issue is that ATP does not contain yet metadata information for CEs and in particular, discovery of MPI flavours. This means that you cannot currently configure Nagios to test MPI if you are using ATP as topology provider for NCG. For this, we recommend you to keep on using SAM, through the variables for YAIM in site-info.def
|
To configure Nagios, NCG needs to contact GOCDB. If GOCDB is down, you will see these error logs:
| opt/glite/yaim/bin/yaim -s /root/yaim/site-info.def -c -n glite-UI -n glite-NAGIOS
results in this error:
...
INFO: Configuring for Profiles : ROC
INFO: Configuring ncg cronjob
INFO: Generating nagios configation from NCG (This might take a few minutes)...
Running ncg: FAILED
Enabling ncg cron: [ OK ]
ERROR: ncg service failed
ERROR: Configuration error !
/var/log/ncg.log contains:
Mon Dec 13 10:21:07 MSK 2010 : ERROR: Could not get results from GOCDB: 500 SSL negotiation failed:
Mon Dec 13 10:21:07 MSK 2010 : mv: cannot stat `/etc/nagios/wlcg.d.ncg.backup': No such file or directory |
List of Issues fixed in this release
jiraissues: Unable to determine if sort should be enabled.
Metrics for this release
gadget: Error rendering gadget [ https://tomtools.cern.ch/jira/rest/gadgets/1.0/g/com.atlassian.jira.gadgets:pie-chart-gadget/gadgets/piechart-gadget.xml ] projectOrFilterId=filter-10062&statType=issuetype&isConfigured=true&refresh=false
gadget: Error rendering gadget [ https://tomtools.cern.ch/jira/rest/gadgets/1.0/g/com.atlassian.jira.gadgets:average-age-chart-gadget/gadgets/average-age-gadget.xml ] projectOrFilterId=filter-10062&periodName=daily&daysprevious=90&isConfigured=true&refresh=false |
gadget: Error rendering gadget [ https://tomtools.cern.ch/jira/rest/gadgets/1.0/g/com.atlassian.jira.gadgets:pie-chart-gadget/gadgets/piechart-gadget.xml ] projectOrFilterId=filter-10062&statType=components&isConfigured=true&refresh=false
gadget: Error rendering gadget [ https://tomtools.cern.ch/jira/rest/gadgets/1.0/g/com.atlassian.jira.gadgets:created-vs-resolved-issues-chart-gadget/gadgets/createdvsresolved-gadget.xml ] projectOrFilterId=filter-10062&periodName=daily&daysprevious=30&isCumulative=true&showUnresolvedTrend=false&versionLabel=major&isConfigured=true&refresh=false |
|