This page last changed on Jun 18, 2013 by mbabik.

Release: Update-17.1

Summary

Start Date 25 July 2012
End Date 26 July 2012
Status Released
Release Date 27 August 2012
Release Manager Wojciech Lapka
Main Activities Migration from MDDB to POEM

Validation Steps performed

List of packages updated in this release

Node sam-nagios

atp-1.23.9-1.el5
atp-web-1.23.9-1.el5
glite-yaim-nagios-1.7.40-7.el5
grid-monitoring-config-gen-0.89.7-1.el5
grid-monitoring-probes-ch.cern.sam-1.6.2-1.el5
grid-monitoring-probes-eu.egi.sec-1.0.6-1.el5
grid-monitoring-probes-org.ndgf-0.10-2.el5
grid-monitoring-probes-org.sam-0.4.1-1.el5
mrs-1.7.6-4.el5
msg-nagios-bridge-1.0.63-1.el5
mywlcg-1.2.8-3.el5
nagios-3.3.1-1.el5.rf.1oat
nagios-plugins-dg-1.0.1-1.el5
ncg-metric-config-1.0.5-1.el5
perl-Net-STOMP-Client-1.2-1.el5
perl-TOM-2.3-1.el5
poem-0.9.5-1.el5
poem-sync-0.9.5-1.el5
sam-nagios-1.17.2-1.el5
sam-release-1.17.0-1.el5
sam-sync-1.0.6-1.el5
unicore-nagios-plugins-2.1.0-1
unicore-ucc6-5.0.0-1.sl5
unicore-uvos-clc-1.6.0-0.sl5
voms2htpasswd-1.12.2-1.el5

New dependencies (not included in the sa1 repository)
java-1.6.0-openjdk
tzdata-java

Dependencies removed
unicore-monitoring-probes

Node sam-gridmon

ace-0.1.37-1.el5
atp-1.23.9-1.el5
atp-web-1.23.9-1.el5
dax-1.0.7-1.el5
glite-yaim-nagios-1.7.40-7.el5
mrs-1.7.6-4.el5
mywlcg-1.2.8-3.el5
ncg-metric-config-1.0.5-1.el5
openreports-3.2.07-2
perl-Net-STOMP-Client-1.2-1.el5
perl-TOM-2.3-1.el5
poem-0.9.5-1.el5
poem-sync-0.9.5-1.el5
rgf-1.0.4-1
sam-gridmon-1.17.2-1.el5
sam-release-1.17.0-1.el5
sqlalchemy-0.7.5-4.el5
voms2htpasswd-1.12.2-1.el5

New dependencies (not included in the sa1 repository)
PyXML
curl
libidn
python-curl

Release Notes

  • ACE
    • Possibility of defining algorithm (ANDing or ORing) for service flavours
    • New Nagios probes (for ops-monitor) for monitoring of ACE behaviour.
    • Use associations between ATP groups (Tiers/Sites).
    • Integration with POEM and data migration to POEM
    • Bug fixes
  • ATP
    • Synchronize all external information sources
    • User authentication in API for retrieving contacts
    • Concurrency improvement: locking mechanism for ATP
    • Keep associations between Tiers/PhysicalSites.
    • Bug fixes.
  • DAX
    • New component: Data Transfer Computation Engine - generation of FTS graphs in the MyWLCG portal (only on node sam-gridmon)
  • glite-yaim-nagios
    • SAM concurrency improvements
    • Enable filtering of metrics from remote Nagioses
    • Support definition of multiple destinations in msg-consume2db (only sam-gridmon)
  • grid-monitoring-probes-ch.cern.sam
    • Probe for checking of deployed SAM-Nagios version
  • grid-monitoring-probes-eu.egi.sec
    • Ability to ignore expired CRLs on CRL check
    • Blacklisting directiories in Permission probe
    • Bug fixes
  • MRS
    • Decomissioning of Nagios metric 'org.egee.MrsCheckMissingProbes'
    • Integration with POEM and data migration to POEM
      • Computation logic based on POEM
      • New bootstrapping mechanism
    • Accept new aligned service type names for OSG services
    • Bug fixes
  • MyWLCG
    • Integration with POEM
    • MyWLCG filters - ANDing by default
    • Integration of Data Transfers (only sam-gridmon)
    • Changes in MyWLCG views:
      • Hide site names in Gridmap view
      • Removed views: "Service Status" and "Metric Status"
      • Renamed view "Service Status History" to "Service Status"
      • Service Availability: quality bars default, remove the labels below the Apply button, round the avail number shown in hint to 2 decimal numbers
    • Simple benchmark for API
    • Bug fixes
  • Nagios
    • Integration with new version of Nagios Core (3.3.1)
  • NCG
    • Integration with POEM
    • Don't use old SAM PI for getting remote metric results
    • Use ATP for retrieving service endpoints and users information
    • Service ncg is replaced with sam-sync (see SAM-2518). Yaim variable NAGIOS_NCG_ENABLE_CRON is removed and Yaim will always switch on service sam-sync. In order to switch off automatic config generation one must switch off sam-sync after each Yaim run.
    • Script ncg.reload.sh does not execute external components anymore (i.e. atp, mddb and mrs sync). Script can now be used for simple changes in NCG config (e.g. localdb changes) without waiting for synchronizers to finish.
    • NCG::LocalMetrics::Hash is disabled and only NCG::LocalMetrics::POEM is used. Exception are site and security roles. Yaim variables NCG_HASH_CONFIG_PROFILES, NCG_PROFILE_FQAN_* are removed. Profiles and FQAN mappings should be defined in POEM profile.
    • NCG::LocalMetrics::Hash_local module is obsoleted. Metric configuration files and POEM profiles should be used instead.
    • DesktopGrid probes are integrated into SAM(see SAM-2421). In order for probes to be properly configured URL field in GOCDB must point to URL where XML reports are stored (e.g. http://edgi-bridge.ibercivis.es/3gbridge_report_dir). No additional steps on SAM box are needed for probes to work.
    • Starting from Update-17.1 packages unicore-ucc and unicore-uvos-clc needed for UNICORE probes are distributed as part of SAM and manual installation is NOT needed.
    • Several bug fixes
  • POEM
    • Import mddb profiles via yaim
    • Concurrency improvement: locking mechanism for POEM
    • Service poem-sync replaces mddb-sync to synchronizes profiles and metrics.
    • For NGI-Nagios migration is transparent and no changes need to be applied.
    • For VO-Nagioses namespace needs to be established with a profile that will determine how Nagios, MRS and MyEGI are configured. Please follow VO-Nagios section at Installing SAM/Nagios guide
    • POEM User's Guide is available at POEM User's Guide
  • RGF (only for sam-gridmon)
    • Report generation framework
    • New package for generating ACE reports
    • Integration with POEM
  • voms2htpasswd
    • Take data from ATP for configuring file '/etc/nagios/htpasswd.users'

Configuration changes (common)

  • POEM configuration
    See POEM User's Guide.
    For NGI-Nagios migration is transparent and no changes need to be applied.
  • New Yaim configuration variables:
    MYEGI_DEFAULT_PROFILE - default profile in MyEGI (default: ROC_CRITICAL)
    MYWLCG_THROTTLE - If THROTTLE is set to True, limit the number of accesses per IP address in a given time in seconds (default: False)
    POEM_WEB_ENABLE - enable poem web instance (default on SAM-Gridmon)
    POEM_NAMESPACE - poem web instance namespace (default: 'ch.cern.sam')
    POEM_ATP_ROOT_URL - poem web instance ATP URL (default: "http://localhost")
    POEM_IMPORT_FROM_MDDB - if True bootstrap profiles from MDDB otherwise use a fixture file (default: False)
    POEM_DEBUG - enable poem web instance debug
    POEM_ADMIN_NAME - poem web instance admin name
    POEM_ADMIN_EMAIL - poem web instance admin e-mail
    POEM_SYNC_URLS - URLs to synchronize from (pointing to poem web instances; sam-nagios defaults to grid-monitoring; sam-gridmon defaults to localhost)
    POEM_SYNC_NS_RESTRICT - restrict synchronization of profiles for given namespace (space separated namespace!profile values)
    

Configuration changes (sam-gridmon)

  • Ask for explicit grant to create tables (not coming from role) for main Oracle account.
  • New Yaim configuration variables:
    DAX_MSG_HOST - Substitutes the name of the Broker host in the consumer configuration of DAX component (Default: "dashb-mb")
    MYWLCG_DATA_TRANSFER Enables Data Transfer Module in MyWLCG (Default: False)
    MYWLCG_DT_VO_OTHERS_LIMIT Place VOs in category 'Others' when total aggregated data tranfer or avg. throughput less than MYWLCG_DT_VO_OTHERS_LIMIT (Default: 2)
    MYWLCG_DT_SRCSITE_OTHERS_LIMIT - Place Source Sites in category 'Others' when total aggregated data tranfer or avg. throughput less than MYWLCG_DT_SRCSITE_OTHERS_LIMIT (Default: 2)
    MYWLCG_DT_DSTSITE_OTHERS_LIMIT - Place Destination Sites in category 'Others' when total aggregated data tranfer or avg. throughput less than MYWLCG_DT_DSTSITE_OTHERS_LIMIT (Default: 5)
    OPENREPORTS_ADMIN - admin user for openreports.
    OPENREPORTS_ADMIN_PASS - admin password for openreports.
    

Configuration changes (sam-nagios)

  • New Yaim configuration variables:
    ATP_ROOT_URL - default value https://grid-monitoring.cern.ch/atp (Always needs to be https).
    NCG_CONTACTS_USE_ATP - use ATP to generate contact lists, requires NCG_TOPOLOGY_USE_ATP to be set (default: true)
    NCG_POEM_ROOT_URL - URL of POEM sync that NCG will use (default: "http://localhost/poem_sync")
    NCG_REMOTE_NAGIOS_HOSTS - list of hosts from where results will be imported, used only on site instance if NCG_REMOTE_USE_NAGIOS is true
    
  • Changed semantics of YAIM variables
    NCG_TOPOLOGY_USE_ATP - switches on both NCG::SiteSet and NCG::SiteInfo, also required for NCG::SiteContacts (default: true)
    NCG_TOPOLOGY_USE_GOCDB - switches on both NCG::SiteSet and NCG::SiteInfo, also required for NCG::SiteContacts (default: false)
    NCG_TOPOLOGY_USE_LDAP - switches on both NCG::SiteInfo and NCG::LocalMetricsAttrs (default: false)
    
  • Removed YAIM variables
    NCG_MDDB_SUPPORTED_PROFILES
    NCG_HASH_CONFIG_PROFILES
    NCG_PROFILE_FQAN_*
    MDDB_SYNC_TIMEOUT
    NAGIOS_NCG_ENABLE_CRON
    NCG_TOPOLOGY_ATP_ROOT_URL
    NCG_TOPOLOGY_USE_SAM
    NCG_TOPOLOGY_USE_ENOC
    NCG_REMOTE_USE_ENOC
    
  • Removed localdb configuration options (definition of metrics in localdb):
    ADD_PROFILE_SERVICE_METRIC!profile!service!metric
    METRIC_PROBE!metric!probe
    METRIC_METRICSET!metric!metricset
    METRIC_DOCURL!metric!url
    METRIC_NATIVE!metric!native
    METRIC_CONFIG!metric!config!value
    METRIC_DEPENDENCY!metric!metricParent!value
    METRIC_ATTRIBUTE!metric!attribute!value
    METRIC_FLAG!metric!flag
    METRIC_PARENT!metric!parent
    
New version of ARC probes provides new config file /etc/grid-monitoring/org.ndgf.conf. After upgrade make sure that new version is used:
mv /etc/grid-monitoring/org.ndgf.conf.rpmnew /etc/grid-monitoring/org.ndgf.conf

New probes require creation of new testfile for LFC service. Create ops voms proxy and execute following commands as non privileged user:

. /etc/grid-monitoring/org.ndgf.conf
hostname -f > /tmp/testfile
ngcp file:///tmp/testfile lfc://$LFC_PHYSICAL_URL/testfile@$LFC_HOST$LFC_LOGICAL_PATH/testfile
In the SAM Update-17 UNICORE configuration is slightly different:
  • NCG_TOPOLOGY_USE_GOCDB should not be set to true
  • Global registry URL should be defined via localdb files.

For details see: https://tomtools.cern.ch/confluence/pages/viewpage.action?pageId=39157787#SAMsetupforUNICOREservices-SAMconfiguration.

Manual steps before upgrade

Apply following steps before YAIM execution (only when doing upgrade, not needed for clean sam-Nagios installations):

There's wrong version of perl-Directory-Queue in the EGI repository (perl-Directory-Queue-1.6-1.el5).
The correct version is perl-Directory-Queue-1.1-1.el5, which can be taken from the SAM repository
(http://www.sysadmin.hep.ac.uk/rpms/egee-SA1/centos5/x86_64/)
Please downgrade it before doing next steps.
Ensure that no processes are updating the database during the DB upgrade and delete obsolete metrics:
  • Block execution of Nagios check org.egee.SendToMetricStore
    Change temporarily password in /etc/nagios/plugins/send_to_db.ini:
    Check if this Nagios check fails after your change, e.g. org.egee.SendToMetricStore
  • Drop mysql events
    mysql -h <host> -u <user> -p
    use mrs;
    DROP EVENT IF EXISTS `REMOVE_SERVICES_AND_SITES_EVENT`;
    exit;
    [root\]# cd /usr/share/doc/mrs-*/DBScripts/upgrades/1.11/mysql/
    mysql -h <host> -u <user> -p
    source drop_events.sql;
    show events;
    exit;

    You shouldn't see `REMOVE_SERVICES_AND_SITES_EVENT`, 'calcMetricFreq_event', `loadmetricdata_event` nor `purgeMetricStore_event`

  • Ensure that no SQL processes are updating the DB.
    mysql -h <host> -u <user> -p
    show processlist;
    exit;
    • You should see just 2 sessions: event_scheduler and your session you are now connected with.
    • You shouldn't see any sessions where SQL is being executed.
      • Wait if it's the case.
      • Kill these sessions if they don't finish in reasonable time.
  • Delete obsolete metrics
    Several metrics, which existed in MDDB, have been removed from POEM (see SAM-2934).
    To remedy this, download following files: delete_obsolete_metrics_proc.sql and delete_obsolete_profiles.sql, delete_obsolete_metrics_from_mrs.sql.
    Then run following commands in MySQL:
    mysql -h <host> -u <user> mrs -p < delete_obsolete_metrics_proc.sql
    mysql -h <host> -u <user> mrs -p < delete_obsolete_profiles.sql
    mysql -h <host> -u <user> mrs -p < delete_obsolete_metrics_from_mrs.sql
    mysql -h <host> -u <user>  mrs -p
    mysql> call delete_obsolete_profiles(1000);
    mysql> DROP PROCEDURE IF EXISTS delete_obsolete_metrics;
    mysql> DROP PROCEDURE IF EXISTS delete_obsolete_profiles;
  • Run YAIM as during normal upgrade (apply all Configuration Changes first)
  • Create MySQL event
    mysql -h <host> -u <user> -p
    use mrs;
    CREATE EVENT `REMOVE_SERVICES_AND_SITES_EVENT` ON SCHEDULE EVERY 15 MINUTE
    STARTS CURRENT_TIMESTAMP DO CALL REMOVE_SERVICES_AND_SITES();
    exit;

Known Issues

If you're experiencing MDDB synchronizer error during yaim please comment the following line in /opt/glite/yaim/functions/config_mddb:
su nagios -l -c "/usr/bin/check_mddb_sync -t $MDDB_SYNC_TIMEOUT"
New package grid-monitoring-probes-org.sam-0.5.7-1.el5 solves the problem of running org.sam.WN* check on SL6 platform. More details can be found here: https://tomtools.cern.ch/jira/browse/SAM-2999. This package contains nagios binary that cannot be executed on 32-bit architecture and fails on some older WN platforms behind CE service type. Deployment of this package will cause CE service type tests to become UNKNOWN.
Due to SAM-2878 NGI Nagioses do not test services that are NOT in production.
If you need to test "non production" services replace file /usr/lib/perl5/vendor_perl/5.8.5/NCG/SiteInfo/ATP.pm from grid-monitoring-config-gen-0.89.7-1.el5 by the attached file SiteInfo/ATP.pm
Note that this patch has a side effect and Sites in candidate, certification or suspended status will be also monitored.
These sites can be removed manually by modification of file /etc/ncg/ncg.localdb, e.g.
REMOVE_SITE!CERN-PROD

ncg needs to be executed after this change.

For machines running latest version of glite-UI (3.2.10-1 or higher):
Please restart Nagios after yaim execution. Otherwise you may see problems similar to SAM-1693.
service nagios restart
Machines running dax component (only sam-gridmon) need to get registered in the Msg Broker host for enabling the consumer to consume FTS messages.
There is a bug in Yaim function and NCG_BACKUP_INSTANCE is not properly updated (https://tomtools.cern.ch/jira/browse/SAM-2797). If you switch from backup to active instance, variable BACKUP_INSTANCE in /etc/sysconfig/ncg will stay and keep the instance not sending data out. Solution is to remove /etc/sysconfig/ncg and rerun Yaim.
Because of a MRS bug (https://tomtools.cern.ch/jira/browse/SAM-3013) vo independent metrics are not added to statuschange_service_profile on MySQL.
In order to fix it, please download the patch.sql attached (patch.sql) and deploy it (connected to your mysql db) with:
source patch.sql;

List of new metrics

dg.CREAM-CE:

dg.ARC-CE:

dg.TargetSystemFactory:

List of Issues fixed in this release (Without issues discovered during validation)

jiraissues: Unable to determine if sort should be enabled.

List of Issues fixed in this release (Discovered during nightly validation)

jiraissues: Unable to determine if sort should be enabled.

ATP.pm (text/x-perl-script)
delete_obsolete_metrics_proc.sql (application/octet-stream)
delete_obsolete_profiles.sql (application/octet-stream)
delete_obsolete_metrics_from_mrs.sql (application/octet-stream)
patch.sql (application/octet-stream)
Document generated by Confluence on Feb 27, 2014 10:19