This page last changed on Jun 18, 2013 by mbabik.
Release: Update-17.1
Summary
Start Date |
25 July 2012 |
End Date |
26 July 2012 |
Status |
Released |
Release Date |
27 August 2012 |
Release Manager |
Wojciech Lapka |
Main Activities |
Migration from MDDB to POEM |
Validation Steps performed
List of packages updated in this release
Node sam-nagios
Node sam-gridmon
Release Notes
- ACE
- Possibility of defining algorithm (ANDing or ORing) for service flavours
- New Nagios probes (for ops-monitor) for monitoring of ACE behaviour.
- Use associations between ATP groups (Tiers/Sites).
- Integration with POEM and data migration to POEM
- Bug fixes
- ATP
- Synchronize all external information sources
- User authentication in API for retrieving contacts
- Concurrency improvement: locking mechanism for ATP
- Keep associations between Tiers/PhysicalSites.
- Bug fixes.
- DAX
- New component: Data Transfer Computation Engine - generation of FTS graphs in the MyWLCG portal (only on node sam-gridmon)
- glite-yaim-nagios
- SAM concurrency improvements
- Enable filtering of metrics from remote Nagioses
- Support definition of multiple destinations in msg-consume2db (only sam-gridmon)
- grid-monitoring-probes-ch.cern.sam
- Probe for checking of deployed SAM-Nagios version
- grid-monitoring-probes-eu.egi.sec
- Ability to ignore expired CRLs on CRL check
- Blacklisting directiories in Permission probe
- Bug fixes
- MRS
- Decomissioning of Nagios metric 'org.egee.MrsCheckMissingProbes'
- Integration with POEM and data migration to POEM
- Computation logic based on POEM
- New bootstrapping mechanism
- Accept new aligned service type names for OSG services
- Bug fixes
- MyWLCG
- Integration with POEM
- MyWLCG filters - ANDing by default
- Integration of Data Transfers (only sam-gridmon)
- Changes in MyWLCG views:
- Hide site names in Gridmap view
- Removed views: "Service Status" and "Metric Status"
- Renamed view "Service Status History" to "Service Status"
- Service Availability: quality bars default, remove the labels below the Apply button, round the avail number shown in hint to 2 decimal numbers
- Simple benchmark for API
- Bug fixes
- Nagios
- Integration with new version of Nagios Core (3.3.1)
- NCG
- Integration with POEM
- Don't use old SAM PI for getting remote metric results
- Use ATP for retrieving service endpoints and users information
- Service ncg is replaced with sam-sync (see SAM-2518). Yaim variable NAGIOS_NCG_ENABLE_CRON is removed and Yaim will always switch on service sam-sync. In order to switch off automatic config generation one must switch off sam-sync after each Yaim run.
- Script ncg.reload.sh does not execute external components anymore (i.e. atp, mddb and mrs sync). Script can now be used for simple changes in NCG config (e.g. localdb changes) without waiting for synchronizers to finish.
- NCG::LocalMetrics::Hash is disabled and only NCG::LocalMetrics::POEM is used. Exception are site and security roles. Yaim variables NCG_HASH_CONFIG_PROFILES, NCG_PROFILE_FQAN_* are removed. Profiles and FQAN mappings should be defined in POEM profile.
- NCG::LocalMetrics::Hash_local module is obsoleted. Metric configuration files and POEM profiles should be used instead.
- DesktopGrid probes are integrated into SAM(see SAM-2421). In order for probes to be properly configured URL field in GOCDB must point to URL where XML reports are stored (e.g. http://edgi-bridge.ibercivis.es/3gbridge_report_dir). No additional steps on SAM box are needed for probes to work.
- Starting from Update-17.1 packages unicore-ucc and unicore-uvos-clc needed for UNICORE probes are distributed as part of SAM and manual installation is NOT needed.
- Several bug fixes
- POEM
- Import mddb profiles via yaim
- Concurrency improvement: locking mechanism for POEM
- Service poem-sync replaces mddb-sync to synchronizes profiles and metrics.
- For NGI-Nagios migration is transparent and no changes need to be applied.
- For VO-Nagioses namespace needs to be established with a profile that will determine how Nagios, MRS and MyEGI are configured. Please follow VO-Nagios section at Installing SAM/Nagios guide
- POEM User's Guide is available at POEM User's Guide
- RGF (only for sam-gridmon)
- Report generation framework
- New package for generating ACE reports
- Integration with POEM
- voms2htpasswd
- Take data from ATP for configuring file '/etc/nagios/htpasswd.users'
Configuration changes (common)
- POEM configuration
See POEM User's Guide.
| For NGI-Nagios migration is transparent and no changes need to be applied. |
- New Yaim configuration variables:
Configuration changes (sam-gridmon)
- Ask for explicit grant to create tables (not coming from role) for main Oracle account.
- New Yaim configuration variables:
Configuration changes (sam-nagios)
- New Yaim configuration variables:
- Changed semantics of YAIM variables
- Removed YAIM variables
- Removed localdb configuration options (definition of metrics in localdb):
| New version of ARC probes provides new config file /etc/grid-monitoring/org.ndgf.conf. After upgrade make sure that new version is used:
mv /etc/grid-monitoring/org.ndgf.conf.rpmnew /etc/grid-monitoring/org.ndgf.conf
New probes require creation of new testfile for LFC service. Create ops voms proxy and execute following commands as non privileged user:
. /etc/grid-monitoring/org.ndgf.conf
hostname -f > /tmp/testfile
ngcp file:///tmp/testfile lfc://$LFC_PHYSICAL_URL/testfile@$LFC_HOST$LFC_LOGICAL_PATH/testfile
|
Manual steps before upgrade
Apply following steps before YAIM execution (only when doing upgrade, not needed for clean sam-Nagios installations):
| There's wrong version of perl-Directory-Queue in the EGI repository (perl-Directory-Queue-1.6-1.el5).
The correct version is perl-Directory-Queue-1.1-1.el5, which can be taken from the SAM repository
(http://www.sysadmin.hep.ac.uk/rpms/egee-SA1/centos5/x86_64/)
Please downgrade it before doing next steps. |
| Ensure that no processes are updating the database during the DB upgrade and delete obsolete metrics:
|
Known Issues
| If you're experiencing MDDB synchronizer error during yaim please comment the following line in /opt/glite/yaim/functions/config_mddb:
su nagios -l -c "/usr/bin/check_mddb_sync -t $MDDB_SYNC_TIMEOUT" |
| Due to SAM-2878 NGI Nagioses do not test services that are NOT in production.
If you need to test "non production" services replace file /usr/lib/perl5/vendor_perl/5.8.5/NCG/SiteInfo/ATP.pm from grid-monitoring-config-gen-0.89.7-1.el5 by the attached file SiteInfo/ATP.pm
Note that this patch has a side effect and Sites in candidate, certification or suspended status will be also monitored.
These sites can be removed manually by modification of file /etc/ncg/ncg.localdb, e.g.
ncg needs to be executed after this change. |
| For machines running latest version of glite-UI (3.2.10-1 or higher):
Please restart Nagios after yaim execution. Otherwise you may see problems similar to SAM-1693.
|
| Machines running dax component (only sam-gridmon) need to get registered in the Msg Broker host for enabling the consumer to consume FTS messages. |
| There is a bug in Yaim function and NCG_BACKUP_INSTANCE is not properly updated (https://tomtools.cern.ch/jira/browse/SAM-2797). If you switch from backup to active instance, variable BACKUP_INSTANCE in /etc/sysconfig/ncg will stay and keep the instance not sending data out. Solution is to remove /etc/sysconfig/ncg and rerun Yaim. |
| Because of a MRS bug (https://tomtools.cern.ch/jira/browse/SAM-3013) vo independent metrics are not added to statuschange_service_profile on MySQL.
In order to fix it, please download the patch.sql attached (patch.sql) and deploy it (connected to your mysql db) with:
source patch.sql; |
List of new metrics
dg.CREAM-CE:
dg.ARC-CE:
dg.TargetSystemFactory:
List of Issues fixed in this release (Without issues discovered during validation)
jiraissues: Unable to determine if sort should be enabled.
List of Issues fixed in this release (Discovered during nightly validation)
jiraissues: Unable to determine if sort should be enabled.
|