Skip Headers
Oracle® Enterprise Manager Administration
11g Release 1 (11.1.0.1)

Part Number E16790-03
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

17 High Availability: Single Resource Configurations

Single resource configurations consist of a single instance Enterprise Manager configuration utilizing some form of protected storage to protect both the repository database and the software installation. As one of the most common installation types, implementing high availability for single resource configurations is cost effective from both time and resource standpoints as the objective is to leverage the technology that is already available, such as Recovery Manager, Flashback, Ultrasafe, and Automated Storage Management.

This chapter covers the following topics:

About Single Resource Configurations

The configurations described in this chapter are provided as examples only. The actual Grid Control configurations that you deploy in your own environment will vary depending upon the needs of your organization. For example, in a production environment you will likely want to implement firewalls and other security considerations. For specific information on implementing firewalls and security protocols, you should refer to the following Enterprise Manager documentation:

Besides providing a description of common configurations, this chapter can also help you understand the architecture and flow of data among the Grid Control components. Based on this knowledge, you can make better decisions about how to configure Grid Control for your specific management requirements.

The Grid Control architecture consists of the following software components:

Deploying Grid Control Components on a Single Host

Figure 17-1 shows how each of the Grid Control components are configured to interact when you install Grid Control on a single host. This is the default configuration that results when you use the Installing a New Grid Control installation type.

In Enterprise Manager release 11g, the installation does not create a new database. You must install a database which should be on the same host as Enterprise Manager.

Figure 17-1 Grid Control Components Installed on a Single Host

Description of Figure 17-1 follows
Description of "Figure 17-1 Grid Control Components Installed on a Single Host"

When you install all the Grid Control components on a single host, the management data travels along the following paths:

  1. Administrators use the Grid Control console to monitor and administer the managed targets that are discovered by the Management Agents on each host. The Grid Control console uses the following URL to connect to the Oracle HTTP Server:

    https://host1.acme.com:7799/em

    The Management Service retrieves data from the Management Repository as it is requested by the administrator using the Grid Control console.

  2. The Management Agent loads its data (which includes management data about all the managed targets on the host, including the Management Service and the Management Repository database) by way of the Oracle HTTP Server upload URL. The Management Agent uploads data directly to Oracle HTTP Server. The default port for the upload URL is 1159 (if it is available during the installation procedure). The upload URL is defined by the REPOSITORY_URL property in the following configuration file in the Management Agent home directory:

    AGENT_HOME/sysman/config/emd.properties (UNIX)
    AGENT_HOME\sysman\config\emd.properties (Windows)
    

    See Also:

    For more information about the Oracle Enterprise Manager directory structure (AGENT_HOME directory in particular), see the Oracle® Enterprise Manager Grid Control Advanced Installation and Configuration Guide.
  3. The Management Service uses JDBC connections to load data into the Management Repository database and to retrieve information from the Management Repository so it can be displayed in the Grid Control console. The Management Repository connection details can be listed and changed by using the following emctl commands:

    emctl config oms -list_repos_details
    emctl config oms -store_repos_details
    

    See Also:

    "Reconfiguring the Oracle Management Service" for more information on modifying the Management Repository connection information.
  4. The Management Service sends data to the Management Agent by way of HTTP. The Management Agent software includes a built-in HTTP listener that listens on the Management Agent URL for messages from the Management Service. As a result, the Management Service can bypass the Oracle HTTP Server and communicate directly with the Management Agent. If the Management Agent is on a remote system, no Oracle HTTP Server is required on the Management Agent host.

    The Management Service uses the Management Agent URL to monitor the availability of the Management Agent, submit Enterprise Manager jobs, and other management functions.

    The Management Agent URL can be identified by the EMD_URL property in the following configuration file in the Management Agent home directory:

    AGENT_HOME/sysman/config/emd.properties (UNIX)
    AGENT_HOME\sysman\config\emd.properties (Windows)
    

    For example:

    EMD_URL=https://host1.acme.com:3872/emd/main
    

    In addition, the name of the Management Agent as it appears in the Grid Control console consists of the Management Agent host name and the port used by the Management Agent URL.

Backup and Recovery

Although Enterprise Manager functions as a single entity, technically, it is built on a distributed, multi-tier software architecture composed of the following software components:

Each component, being uniquely different in composition and function, requires different approaches to backup and recovery. For this reason, the backup and recovery strategies are discussed on a per-tier basis in this chapter. For an overview of Enterprise Manager architecture, refer to the Oracle® Enterprise Manager Grid Control Basic Installation Guide.

Oracle Configuration Manager

Oracle Configuration Manager (OCM) is used to collect client configuration information and upload it to the Oracle repository. When the client configuration data is uploaded on a regular basis, customer support representatives can analyze this data and provide better service to the customers.

When installing Oracle software, the installer provides an option to setup and configure OCM. In recovery scenarios where software is installed in software-only mode, OCM can be configured manually by running the following from OMS and agent Oracle Homes:

<OracleHome>/ccr/bin/setupCCR

The Oracle Configuration Manager client is installed into the ORACLE_HOME directory. Once installed, OCM collects configuration data related to the ORACLE_HOME directory and the host on which it is installed. In addition to collecting and uploading configuration data, it also checks if any software updates to the Oracle Configuration Manager client are available. If updates are available, it downloads them and updates the Oracle Configuration Manager software installed on the customer's system.

Repository Backup and Recovery

The Repository is the storage location where all the information collected by the Agent gets stored. It consists of objects such as database jobs, packages, procedures, views, and tablespaces. Because it is configured in an Oracle Database, the backup and recovery strategies for the repository are essentially the same as those for the Oracle Database. Backup procedures for the database are well established standards and can be implemented using the RMAN backup utility, which can be accessed via the Enterprise Manager console.

Repository Backup

Oracle recommends using High Availability Best Practices for protecting the Repository database against unplanned outages. As such, use the following standard database backup strategies.

  • Database should be in archivelog mode. Not running the repository database in archivelog mode leaves the database vulnerable to being in an unrecoverable condition after a media failure.

  • Perform regular hot backups with RMAN using the Recommended Backup Strategy option via the Enterprise Manager console. Other utilities such as DataGuard and RAC can also be used as part of a comprehensive strategy to prevent data loss.

Adhering to these strategies will create a full backup and then create incremental backups on each subsequent run. The incremental changes will then be rolled up into the baseline, creating a new full backup baseline.

Using the Recommended Backup Strategy also takes advantage of the capabilities of Enterprise Manager to execute the backups: Jobs will be automatically scheduled through the Job sub-system of Enterprise Manager. The history of the backups will then be available for review and the status of the backup will be displayed on the repository database target home page. This backup job along with archiving and flashback technologies will provide a restore point in the event of the loss of any part of the repository. This type of backup, along with archive and online logs, allows the repository to be recovered to the last completed transaction.

You can view when the last repository backup occurred on the Management Services and Repository Overview page under the Repository details section.

Setting Up the Backup

First, navigate to the Enterprise Manager Recovery Settings page (Target-->Database--><Repository Database Target>-->Availability-->Recovery Settings) and enable Archive Logging then Flashback Database as shown in Figure 17-2.

Figure 17-2 Recovery Settings Page

Recovery settings page

Next, navigate to the Backup Policies page (Target-->Database--><Repository Database Target>-->Availability-->Backup Settings-->Policy) and enable Block Change Tracking to speed up backup operations as shown in Figure 17-3.

Figure 17-3 Backup Policy Page

Description of Figure 17-3 follows
Description of "Figure 17-3 Backup Policy Page"

Figure 17-4 Backup Policy Page

backup policy page

A thorough summary of how to configure backups using Enterprise Manager is available in the Oracle Database 2 Day DBA guide. For additional information on Database high availability best practices, review the Oracle Database High Availability Best Practices documentation.

Repository Recovery

Recovery of the Repository database must be performed using RMAN since Grid Control will not be available when the repository database is down. There are two recovery cases to consider:

  • Full Recovery: No special consideration is required for Enterprise Manager.

  • Point-in-Time/Incomplete Recovery: Recovered repository may be out of sync with Agents because of lost transactions. In this situation, some metrics may show up incorrectly in the Grid Control console unless the repository is synchronized with the latest state available on the Agents.

A repository resync feature (Enterprise Manager version 10.2.0.5 and later) allows you to automate the process of synchronizing the Enterprise Manager repository with the latest state available on the Agents.

Note:

resync requires Agents version 10.2.0.5 or later. Older Agents must be synchronized manually. See "Manually Resynchronizing Agents".

To resynchronize the repository with the Agents, you use Enterprise Manager Command-line utility (emctl) resync repos command:

emctl resync repos -full -name "<descriptive name for the operation>"

You must run this command from the OMS Oracle Home after restoring the repository but BEFORE starting the OMS. After submitting the command, start up all OMS's and monitor the progress of repository resychronization from the Enterprise Manager console's Repository Resynchronization page, as shown in Figure 17-5.

Figure 17-5 Repository Synchronization Page

Description of Figure 17-5 follows
Description of "Figure 17-5 Repository Synchronization Page"

Repository recovery is complete when the resynchronization jobs complete on all Agents.

Oracle strongly recommends that the repository database be run in archivelog mode so that in case of failure, the database can be recovered to the latest transaction. If the database cannot be recovered to the last transaction, Repository Synchronization can be used to restore monitoring capabilities for targets that existed when the last backup was taken. Actions taken after the backup will not be recovered automatically. Some examples of actions that will not be recovered automatically by Repository Synchronization are:

  • Notification Rules

  • Preferred Credentials

  • Groups, Services, Systems

  • Jobs/Deployment Procedures

  • Custom Reports

  • New Agents

Manually Resynchronizing Agents

The Enterprise Manager Repository Synchronization feature can only be used for Agents 10.2.0.5 or later. Older Agents must be resynchronized manually by shutting down the Agent using the following procedure:

  1. Shut down the Agent.

  2. Delete the agentstmp.txt, lastupld.xml, state/* and upload/* files from the <AGENT_HOME>/sysman/emd directory.

  3. Restart the Agent.

Recovery Scenarios

A prerequisite for repository (or any database) recovery is to have a valid, consistent backup of the repository. Using Enterprise Manager to automate the backup process ensures regular, up-to-date backups are always available if repository recovery is ever required. Recovery Manager (RMAN) is a utility that backs up, restores, and recovers Oracle Databases. The RMAN recovery job syntax should be saved to a safe location. This allows you to perform a complete recovery of the Enterprise Manager repository database. In its simplest form, the syntax appears as follows:

run {

restore database;

recover database;

}

Actual syntax will vary in length and complexity depending on your environment. For more information on extracting syntax from an RMAN backup and recovery job, or using RMAN in general, see the Oracle Database Backup and Recovery User's Guide.

The following scenarios illustrate various repository recovery situations along with the recovery steps.

Full Recovery on the Same Host

Repository database is running in archivelog mode. Recent backup, archive log files and redo logs are available. The repository database disk crashes. All datafiles and control files are lost.

Resolution:

  1. Stop the OMS(s) using emctl stop oms.

  2. Recover the database using RMAN

  3. Bring the site up using the command emctl start oms on all OMS(s).

  4. Verify that the site is fully operational.

Incomplete Recovery on the Same Host

Repository database is running in noarchivelog mode. Full offline backup is available. The repository database disk crashes. All datafiles and control files are lost.

Resolution:

  1. Stop the OMS(s) using emctl stop oms.

  2. Recover the database using RMAN.

  3. Initiate Repository Resync using emctl resync repos -full -name "<resync name>" from one of the OMS Oracle Home.

  4. Start the OMS(s) using emctl start oms.

  5. Manually fix any pre-10.2.0.5 Agent by shutting down the Agent, deleting the agentstmp.txt, lastupld.xml, state/* and upload/* files under the <AGENT_HOME>/sysman/emd directory, and then restarting the Agents.

  6. Log into Grid Control. Navigate to Management Services and Repository Overview page. Click on Repository Synchronization under Related Links. Monitor the status of resync jobs. Resubmit failed jobs, if any, after fixing the error.

  7. Verify that the site is fully operational.

Full Recovery on a Different Host

The repository database is running on host "A" in archivelog mode. Recent backup, archive log files and redo logs are available. The repository database crashes. All datafiles and control files are lost.

Resolution:

  1. Stop the OMS(s) using the command emctl stop oms.

  2. Recover the database using RMAN on a different host (host "B").

  3. Correct the connect descriptor for the repository in credential store by running

    $emctl config oms –store_repos_details -repos_conndesc <connect descriptor> -repos_user sysman
    
  4. Start the OMS(s) using the command emctl start oms.

  5. Relocate the repository database target to the Agent running on host "B" by running the following command from the OMS:

    $emctl config repos -host <hostB> -oh <OH of repository on hostB>  -conn_desc "<TNS connect descriptor>"
    

    Note:

    This command can only be used to relocate the repository database under the following conditions:
    • An Agent is already running on this machine.

    • No database on host "B" has been discovered.

    If a new Agent had been installed on host "B", you must ensure there are NO previously discovered database targets.

  6. Change the monitoring configuration for the OMS and Repository target: by running the following command from the OMS:

    $emctl config emrep -conn_desc "<TNS connect descriptor>"
    
  7. Verify that the site is fully operational.

Incomplete Recovery on a Different Host

The repository database is running on host "A" in noarchivelog mode. Full offline backup is available. Host "A" is lost due to hardware failure. All datafiles and control files are lost.

Resolution:

  1. Stop the OMS(s) using emctl stop oms.

  2. Recover the database using RMAN on a different host (host "B").

  3. Correct the connect descriptor for the repository in credential store.

    $emctl config oms –store_repos_details -repos_conndesc <connect descriptor> -repos_user sysman
    
  4. Initiate Repository Resync:

    emctl resync repos -full -name "<resync name>"

    from one of the OMS Oracle Homes.

  5. Start the OMS(s) using the command emctl start oms.

  6. Run the command to relocate the repository database target to the Agent running on host "B":

    emctl config repos -agent <agent on host B> -host <hostB> -oh <OH of repository on hostB> -conn_desc "<TNS connect descriptor>"

  7. Run the command to change monitoring configuration for the OMS and Repository target:

    emctl config emrep -conn_desc "<TNS connect descriptor>"

  8. Manually fix all pre-10.2.0.5 Agents by shutting down the Agents, deleting the agentstmp.txt, lastupld.xml, state/* and upload/* files under the <AGENT_HOME>/sysman/emd directory and then restarting the Agents.

  9. Log in to Grid Control. Navigate to Management Services and Repository Overview page. Click on Repository Synchronization under Related Links. Monitor the status of resync jobs. Resubmit failed jobs, if any, after fixing the error mentioned.

  10. Verify that the site is fully operational.

Oracle Management Service Backup and Recovery

The Oracle Management Service (OMS) orchestrates with Management Agents to discover targets, monitor and manage them, and store the collected information in a repository for future reference and analysis. The OMS also renders the Web interface for the Enterprise Manager console. For Enterprise Manager version 11.1, the OMS architecture has changed.

Backing Up the OMS

The OMS is generally stateless. Some transient and configuration data is stored on the OMS file system. The shared loader “recv”directory stores metric data uploaded from Agents temporarily before the data is loaded into the repository. If an OMS goes down, other surviving OMS(s) upload the data stored in the shared loader location. In a High Availability (HA) configuration, the shared loader receive directory should be protected using an HA storage technology, such as a redundant disk.

A snapshot of OMS configuration can be taken using the emctl exportconfig oms command.

emctl exportconfig oms [-sysman_pwd <sysman password>]

      [-dir <backup dir>]     Specify directory to store backup file

      [-keep_host]            Specify this parameter if the OMS was installed

                              using a virtual hostname. 

                              For example: ORACLE_HOSTNAME

Note:

The exportconfig oms command is only available with Enterprise Manager version 10.2.0.5 or newer.

Running exportconfig captures a snapshot of the OMS at a given point in time, thus allowing you to back up the most recent OMS configuration on a regular basis. If required, the most recent snapshot can then be restored on a fresh OMS installation on the same or different host.

Backup strategies for the OMS components are as follows:

  • Software Homes

    Composed of three WebLogic components – Middeware Home, the OMS Oracle Home and the WebTier (OHS) Oracle Home. Software Homes only change when patches or patchsets are applied. For this reason, filesystem-level backups should be taken after each patch/patchset application. You should back up the Oracle inventory files along with the Software Homes. .

    Important:

    Beginning with Enterprise Manager version 11.1, the location of the OMS Oracle Home must be the same for all OMS's in your monitored environment.
  • Instance Home

    Composed of WebLogic, OMS and WebTier configuration files. The Instance Home can be backed up using the emctl exportconfig oms command.

  • Software Library

    Composed of components used by Enterprise Manager patching and provisioning functions. Oracle Database Filesystem (DBFS) is recommended for software library backup. DBFS technology allows an Oracle database tablespace to be exposed to applications as a mounted filesystem. Internally, all the files are stored as secure files in the Oracle database. Storing the software library in the Enterprise Manager repository database using DBFS lets you leverage the comprehensive capabilities of the Oracle database to take consistent backups of the software library along with the Enterprise Manager repository. For more information about DBFS, see the Oracle® Database SecureFiles and Large Objects Developer's Guide.

  • Shared Loader RECV Directory

    The shared loader receive (RECV) directory temporarily stores metric data uploaded from Agents before the data is loaded into the repository. Use a high availability storage technology to protect the receive directory.

  • AdminServer

    Beginning with Enterprise Manager version 11.1, the OMS's WebLogic architecture introduces the concept of an AdminServer. The AdminServer operates as the central control entity for the configuration of the entire OMS(s) domain. The AdminServer is an integral part of the first OMS installed in your Grid Control deployment and shares the Software Homes and Instance Home.

Recovering the OMS

If an OMS is lost, it should be reinstalled using “Installing Software Only and Configuring Later". This is an additional Management Service option documented in the Oracle Enterprise Manager Grid Control Installation and Basic Configuration guide. The OMS configuration can then be restored with the OMS Configuration Assistant using the following command:

omsca recovery -BACKUP_FILE <file>

Use the export file generated by the emctl exportconfig command shown in the previous section.

Recovering an OMS essentially consists of two steps, recovering the Software Homes and then configuring the Instance Home. When restoring on the same host, the software homes can be restored from filesystem backup. In case a backup does not exist, the software homes can be reconstructed using the software-only installation of WebLogic and OMS, software-only installation of add-ons (if any) and all patches that were applied before the crash. As stated earlier, the location of the OMS Oracle Home is fixed and cannot be changed. Hence, ensure that the OMS Oracle Home is restored in the same location that was used previously.

Once the Software Homes are recovered, the instance home can be reconstructed using the omsca command in recovery mode.

OMS Recovery Scenarios

The following scenarios illustrate various OMS recovery situations along with the recovery steps.

Important:

A prerequisite for OMS recovery is to have recent, valid OMS configuration backups available. Oracle recommends that you back up the OMS using the emctl exportconfig oms command whenever an OMS configuration change is made. This command must be run on the primary OMS running the WebLogic AdminServer.

Alternatively, you can run this command on a regular basis using the Enterprise Manager Job system.

Single OMS, No Server Load Balancer (SLB), OMS Restored on the same Host

Site hosts a single OMS. No SLB is present. The OMS configuration was backed up using the emctl exportconfig oms command on the primary OMS running the AdminServer. The OMS Oracle Home is lost.

Resolution:

  1. Ensure that loader receive directory and software library locations are still accessible.

  2. Restore the software homes from filesystem backup taken earlier. Alternately, if a backup does not exist, use the software-only install method to reconstruct the WebLogic and OMS Oracle Home, add-ons that were installed earlier need to be reinstalled in software-only mode and all patches that were applied earlier need to be reapplied. Remember that the location of OMS Oracle Home needs to be the same as one used before.

  3. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    omsca recover –as –ms –backup_file <file>
    

    Note: The -backup_file to be passed must be the latest file generated from emctl exportconfig oms command.

  4. Configure agent:

    >agentca -f
    >emctl secure agent -emdWalletSrcUrl <oms url>
    
  5. At this point, two possibilities exist depending upon the port used by the reinstalled agent that comes along with the OMS:

    Option A: Agent uses the same port as the previous installation.

    • OMS automatically blocks the Agent. Resync the Agent from Agent homepage

    Option B: Agent uses a different port.

    • Run the command to relocate the OMS and Repository target to reinstalled Agent:

      emctl config emrep -agent <reinstalled agent>

      Example: emctl config emrep -agent foo.us.oracle.com:3872

  6. Locate duplicate targets from the Management Services and Repository Overview page. Relocate duplicate targets from the old agent to the reinstalled Agent. Delete the old Agent.

  7. Verify that the site is fully operational.

Single OMS, No SLB, OMS Restored on a Different Host

Site hosts a single OMS. The OMS is running on host "A." No SLB is present. The OMS configuration was backed up using the emctl exportconfig oms command. Host "A" is lost.

Resolution:

  1. Ensure that loader receive directory and software library locations are accessible from Host "B".

  2. Usually filesystem restore does not work across hosts. Use the software only install method to reconstruct the WebLogic and OMS Oracle home, add-ons that were installed earlier need to re-installed in software only mode and all patches that were applied earlier need to be reapplied. Remember that the location of OMS Oracle Home must be the same as one used before.

  3. Run omsca in recovery mode specifying the OMS configuration backup file generated earlier to configure the new OMS:

    omsca recover –as –ms –backup_file <file>
    
  4. Configure agent:

    agentca -f
    emctl secure agent -emdWalletSrcUrl <oms url>
    
  5. Change the OMS to which all Agents point and then resecure all Agents

    • Make all Agents in the deployment point to new OMS running on Host "B". On each Agent, run the following command

      emctl secure agent -emdWalletUrlSrc "http://hostB:<httpport>/em"

    • Run the command to relocate OMS and Repository target to Agent "B":

      emctl config emrep -agent <agent on host "B">.

      Note:

      Because the new machine is using a different hostname from the one originally hosting the OMS, all Agents in your monitored environment must be told where to find the new OMS.
  6. Locate duplicate targets from the Management Services and Repository Overview page of the Enterprise Manager console. Click the Duplicate Targets link to access the Duplicate Targets page. To resolve duplicate target errors, the duplicate target must be renamed on the conflicting Agent. Relocate duplicate targets from Agent "A" to Agent "B".

  7. Verify that the site is fully operational.

Single OMS, No SLB, OMS Restored on a Different Host using the Original Hostname

Site hosts a single OMS. The OMS is running on host "A." No SLB is present. The OMS configuration was backed up using the emctl exportconfig oms command. Host "A" is lost.

Resolution:

  1. Ensure that loader receive directory and software library locations are accessible from Host "B".

  2. Usually filesystem restore does not work across hosts. Use the software-only install method to reconstruct the WebLogic and OMS Oracle home, add-ons that were installed earlier need to re-installed in software-only mode and all patches that were applied earlier need to be reapplied. Remember that the location of OMS Oracle home needs to be the same as one used before.

  3. Modify the network configuration such that HostB also responds to hostname Host "A". Specific instructions on how to configure this are beyond the scope of this document. However, some general configuration suggestions are:

    • Modify your DNS server such that both Host "B" and Host "A" network addresses resolve to the physical IP of Host "B".

    • Multi-home Host "B". Configure an additional IP of Host "A" on Host "B". For example, on Host "B" run the following commands:

      > ifconfig eth0:1 <IP of HostA> netmask <netmask>  
      > /sbin/arping -q -U -c 3 -I eth0 <IP of HostA>
      
  4. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    omsca recover –as –ms –backup_file <file>
    
  5. Resecure the OMS:

    emctl secure oms host <Host A>

  6. Configure agent:

    > agentca -f followed by

    > emctl secure agent -emdWalletSrcUrl <oms url>

  7. Run the command to relocate Management Services and Repository target to Agent "B":

    emctl config emrep -agent <agent on host B>

  8. Locate duplicate targets from the Management Services and Repository Overview page of the Enterprise Manager console. Click the Duplicate Targets link to access the Duplicate Targets page. To resolve duplicate target errors, the duplicate target must be renamed on the conflicting Agent. Relocate duplicate targets from Agent "A" to Agent "B".

  9. Verify that the site is fully operational.

Multiple OMS, Server Load Balancer, Primary OMS Recovered on the Same Host

Site hosts multiple OMSs. All OMSs are fronted by a Server Load Balancer. OMS configuration backed up using the emctl exportconfig oms command on the primary OMS running the WebLogic AdminServer. The primary OMS is lost.

Resolution:

  1. Ensure that shared loader receive directory and shared software library locations are still accessible.

  2. Restore the software homes from filesystem backup taken earlier. Alternately if backup does not exist, use the software only install method to reconstruct the WebLogic and OMS Oracle home, add-ons that were installed earlier need to re-installed in software only mode and all patches that were applied earlier need to be reapplied. Remember that the location of OMS Oracle home needs to be the same as one used before.

  3. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    omsca recover -as -ms -backup_file <file>

  4. Resecure the Agent that gets installed along with OMS.

    emctl secure agent -emdWalletSrcUrl "http://slb:<httpport>/em"

  5. At this point, two possibilities exist depending upon the port used by the reinstalled agent that comes along with the OMS:

    • Option A: Agent gets the same port as earlier--OMS automatically blocks the agent. Resync the agent from agent homepage.

    • Option B: Agent gets a different port--Run the command to relocate Management Services and Repository target to reinstalled agent:

      emctl config emrep -agent <reinstalled agent>

      Locate duplicate targets from the Management Services and Repository Overview page. Relocate duplicate targets from old agent to reinstalled agent. Delete the old agent.

  6. Re-enroll the additional OMS, if any, with the recovered Administration Server by running emctl enroll oms on each additional OMS.

  7. Verify that the site is fully operational.

Multiple OMS, Server Load Balancer configured, Primary OMS Recovered on a Different Host

Site hosts multiple OMSs. OMSs fronted by a Server Load Balancer. OMS Configuration Backed Up Using emctl exportconfig oms command. Primary OMS on host "A" is lost and needs to be recovered on Host "B".

  1. Ensure that shared loader receive directory and shared software library locations are accessible from the new OMS host (host "B")

  2. Filesystem restore typically does not work across hosts. Use the software-only install method to reconstruct the WebLogic and OMS Oracle home. Add-ons that were installed earlier need to re-installed in software-only mode and all patches that were applied earlier need to be reapplied. Remember that the location of OMS Oracle home needs to be the same as one used before.

  3. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    omsca recover -as -ms -backup_file <file>

  4. Configure agent:

    agentca -f followed by

    emctl secure agent -emdWalletSrcUrl "http://slb:<httpport>/em"

  5. Add the new OMS to the SLB

  6. Relocate the OMS and Repository target to reinstalled Agent:

    emctl config emrep -agent <agent on Host B>

  7. Locate duplicate targets from the Management Services and Repository Overview page. Relocate duplicate targets from the Agent on Host "A" to the Agent on Host "B". Delete the Agent on Host "A".

  8. Re-enroll the additional OMS, if any, with the recovered Administration Server

    emctl enroll oms -as_host <HostB> -as_port <admin secure port>

    Run this command on each additional OMS.

  9. Verify that the site is fully operational.

Multiple OMS, SLB configured, additional OMS recovered on same or different host

Multi OMS site. OMSs fronted by SLB. OMS configuration backed up using emctl exportconfig oms command on the first OMS. Additional OMS is lost and needs to be recovered on same or different host.

  1. Ensure that shared loader receive directory and shared software library locations are accessible.

  2. If recovering on same host, restore the Software Homes from a filesystem backup. Alternatively, if a backup does not exist, or when recovering on a different host, use the software-only install method to reconstruct the WebLogic and OMS Oracle Home. Add-ons that were installed earlier need to reinstalled in software-only mode and all patches that were applied earlier need to be reapplied. The location of the restored OMS Oracle home needs to be the same as the previous.

  3. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    omsca recover –ms –backup_file <file>

  4. Configure agent:

    agentca -f followed by emctl secure agent -emdWalletSrcUrl "http://slb:<httpport>/em"

  5. Add the new OMS (if recovered on a different host) to the SLB

  6. At this point, three possibilities exist depending upon the port used by the reinstalled agent that comes along with the OMS:

    • Option A: OMS installed on same host and agent gets the same port as earlier

      OMS automatically blocks the agent. Resync the agent from agent homepage.

    • Option B: OMS installed on same host and agent gets a different port

      Locate duplicate targets from the Management Services and Repository Overview page. Relocate duplicate targets from old agent to reinstalled agent. Delete the old agent.

    • Option C: OMS installed on different host

      Locate duplicate targets from the Management Services and Repository Overview page. Relocate duplicate targets from old agent to reinstalled agent.

  7. Verify that the site is fully operational.

Agent Backup and Recovery

The Agent is an integral software component that is deployed on each monitored host. It is responsible for monitoring all the targets running on those hosts, communicating that information to the middle-tier OMS and managing and maintaining the hosts and its targets.

Backing Up Agents

There are no special considerations for backing up Agents. As a best practice, reference Agent installs should be maintained for different platforms and kept up-to-date in terms of customizations in the emd.properties file and patches applied. Use Deployment options from the Grid Control console to install and maintain reference Agent installs.

Recovering Agents

If an Agent is lost, it should be reinstalled by cloning from a reference install. Cloning from a reference install is often the fastest way to recover an Agent install as it is not necessary to track and reapply customizations and patches. Care should be taken to reinstall the Agent using the same port. Using the Enterprise Manager's Agent Resynchronization feature, a reinstalled Agent can be reconfigured using target information present in the repository. When the Agent is reinstalled using the same port, the OMS detects that it has been re-installed and blocks it temporarily to prevent the auto-discovered targets in the re-installed Agent from overwriting previous customizations.

Blocked Agents:

A Blocked Agent is a condition where the OMS rejects all heartbeat or upload requests from the blocked Agent. Hence, a blocked Agent will not be able to upload any alerts or metric data to the OMS. However, blocked Agents continue to collect monitoring data.

The Agent can be resynchronized and unblocked from the Agent homepage by clicking on the Resynchronize Agent button. Resynchronization pushes all targets from the repository to the Agent and then unblocks the Agent.

Agent Recovery Scenarios

The following scenarios illustrate various Agent recovery situations along with the recovery steps. Agent recovery is supported for Agent versions 10.2.0.5 and later. The Agent resynchronization feature requires that a reinstalled Agent use the same port as the previous Agent that crashed.

Agent Reinstall Using the Same Port

An Agent is monitoring multiple targets. The Agent installation is lost.

  1. Deinstall Agent OracleHome using the Oracle Universal Installer.

  2. Install a new Agent or use the Agent clone option to reinstall the Agent though Enterprise Manager. Specify the same port as used by the crashed Agent. The location of install need not be same as previous install.

    The OMS detects that Agent has been re-installed and blocks the Agent.

  3. Initiate Agent Resynchronization from the Agent homepage.

    All targets in the repository are pushed to the new Agent. The Agent is instructed to clear backlogged files and then do a clearstate. Agent is unblocked.

  4. Reconfigure User-defined Metrics if the location of User-defined Metric scripts have changed.

  5. Verify that the Agent is operational and all target configurations have been restored.

Agent Restore from Filesystem Backup

An Agent is monitoring multiple targets. File system backup for the Agent OracleHome exists. The Agent install is lost.

  1. Deinstall Agent OracleHome using OUI.

  2. Restore the Agent from file system backup. Start the Agent.

    OMS detects that Agent has been restored from backup and blocks the Agent.

  3. Initiate Agent Resynchronization from the Agent homepage.

    All targets in the repository are pushed to the new Agent. The Agent is instructed to clear backlogged files and performs a clearstate. The Agent is unblocked.

  4. Verify that the Agent is functional and all target configurations have been restored using the following emctl commands:

    emctl status agent
    
    emctl upload agent 
    

    There should be no errors and no XML files in the backlog.

Recovering from a Simultaneous OMS-Repository Failure

When both OMS and repository fail simultaneously, the recovery situation becomes more complex depending upon factors such as whether the OMS and repository recovery has to be performed on the same or different host, or whether there are multiple OMSs fronted by an SLB. In general, the order of recovery for this type of compound failure should be repository first, followed by OMS(s) following the steps outlined in the appropriate recovery scenarios discussed earlier. The following scenarios illustrate two OMS-Repository failures and the requisite recovery steps.

Collapsed Configuration: Incomplete Repository Recovery, Primary OMS on the Same Host

Repository and the primary OMS are installed on same host (host "A"). The repository database is running in noarchivelog mode. Full cold backup is available. A recent OMS backup file exists ( emctl exportconfig oms). The repository, OMS and the Agent crash.

  1. Follow the repository recovery procedure shown in Incomplete Recovery on the Same Host with the following exception:

    Since the OMS OracleHome is not available and repository resynchronization has to be initiated before starting an OMS against the restored repository, submit "resync" via the following PL/SQL block. Log into the repository as SYSMAN using SQLplus and run:

    begin emd_maintenance.full_repository_resync('<resync name>'); end;
    
  2. Follow the OMS recovery procedure shown in Single OMS, No Server Load Balancer (SLB), OMS Restored on the same Host

  3. Verify that the site is fully operational.

Distributed Configuration: Incomplete Repository Recovery, Primary OMS and additional OMS on Different Hosts, SLB Configured

The Repository, primary OMS, and additional OMS all reside on the different hosts. Repository database was running in noarchivelog mode. OMS backup file from a recent backup exists (emctl exportconfig oms). Full cold backup of the database exists. All three hosts are lost.

  1. Follow the repository recovery procedure shown in Incomplete Recovery on the Same Host with the following exception:

    Since OMS OracleHome is not yet available and Repository resync has to be initiated before starting an OMS against the restored repository, submit resync via the following PL/SQL block. Log into the repository as SYSMAN using SQLplus and run the following:

    begin emd_maintenance.full_repository_resync('resync name'); end;
    
  2. Follow the OMS recovery procedure shown in Multiple OMS, Server Load Balancer configured, Primary OMS Recovered on a Different Host with the following exception:

    Override the repository connect description present in the backup file by passing the additional omsca parameter: “-REPOS_CONN_STR <restored repos descriptor>”. This needs to be added along with other parameters listed in Multiple OMS, Server Load Balancer configured, Primary OMS Recovered on a Different Host.

  3. Follow the OMS recovery procedure shown in Multiple OMS, SLB configured, additional OMS recovered on same or different host with the following exception:

    Override the repository connect description and AdminServer details present in the backup file by passing the additional omsca parameters:

    “-REPOS_CONN_STR <restored repos descriptor>” –AS_HOST <recovered admin host> -AS_HTTPS_PORT <recovered admin port>
    

    This must be added along with other parameters listed in Multiple OMS, SLB configured, additional OMS recovered on same or different host.

  4. Verify that the site is fully operational.

EMCTL High Availability Commands

The Enterprise Manager command-line utility (emctl) adds many new commands that allow you to perform requisite backup and recovery operations for all major components.

exportconfig oms

Exports a snapshot of the OMS configuration to the specified directory.

Usage:

emctl exportconfig oms [-sysman_pwd <sysman password>]

      [-dir <backup dir>]     Specify the directory used to store the backup file

      [-keep_host]            Specify to back up hostname if no SLB is defined

                              (Use this option only if recovery will be performed

                               on the machine that responds to this hostname)

importconfig oms

Imports the OMS configuration from the specified backup file.

Usage:

emctl importconfig oms [-sysman_pwd <sysman password>] [-reg_pwd <registration password>]

      -file <backup file>     Required backup file to import from

      [-key_only]             Specify to import emkey only

      [-no_resecure]          Specify not to resecure the oms after import

                              (default is to resecure after import)

config emrep

Configures the OMS and repository target. The command is used to change the monitoring Agent for the target and/or the connection string used to monitor this target.

Usage:

emctl config emrep [-sysman_pwd <sysman password>]

      [-agent <new agent>]    Specify a new destination Agent for emrep target

      [-conn_desc [<jdbc connect descriptor>]]

                      Update the Connect Descriptor with value if specified,

                      else from value stored in the emoms.properties file.

config repos

Configures the repository database target. The command is used to change the monitoring Agent for the target and/or the monitoring properties (hostname, Oracle Home and connection string used to monitor this target).

Usage:

emctl config repos [-sysman_pwd <sysman password>]

      [-agent <new agent>]    Specify new destination agent for repository target

      [-host <new host>]      Specify new hostname for repository target

      [-oh <new oracle home>] Specify new OracleHome for repository target

      [-conn_desc [<jdbc connect descriptor>]]

                       Update the Connect Descriptor with the specified value,

                       else from the value stored in emoms.properties

resync repos

Submits a repository resynchronization operation. When the –full option is specified, all agents are instructed to upload the latest state to the repository. A list of agents can be specified using the –agentlist option to resync with a given list of agents.

Usage:

emctl resync repos (-full|-agentlist "agent names") [-name "resync name"] [-sysman_pwd "sysman password"]

abortresync repos

Aborts the currently running repository resynchronization operation. Use the –full option to stop a full repository resynchronization. Use the –agentlist option to stop resync on a list of agents.

Usage:

emctl abortresync repos (-full|-agentlist "agent names") -name "resync name" [-sysman_pwd "sysman password"]

 

statusresync repos

Lists the status of given repository resynchronization operation.

Usage:

emctl statusresync repos -name "resync name" 

create service

Valid on Windows only. The command creates a service for the Oracle Management Services on Windows. You use this command to manage the Windows service for the OMS on a failover host in a Cold Failover Cluster setup.

Usage:

emctl create service [-user <username>] [-pwd <password>]

      -name <servicename>     Name of service to be created  

delete service

Valid on Windows only. Deletes the service for the Oracle Management Services on Windows.

Usage:

emctl delete service

      -name <servicename>     Name of service to be deleted  

resyncAgent

Resynchronizes a restored or reinstalled Agent by pushing all target configuration from the repository.

Usage:

emcli resyncAgent -agent="Agent Name"

        [-keep_blocked]