Data Pump Log Analyzer: Comprehensive Guide

Introduction

The Data Pump Log Analyzer is a powerful Python script designed to parse and analyze Oracle Data Pump log files. The tool provides valuable insights into Data Pump operations key metrics and performance data.

Installation

Ensure you have Python installed on your system
Minimum required version: Python 3.6
Check with the following command:
```
python3 --version
```
Download the script from GitHub (or clone the repository) and store it on the server where you want to run it:
- Data Pump Log Analyzer GitHub Repository

Basic Usage

The basic syntax for running the Data Pump Log Analyzer is:

python3 dpla.py <file> [options]

Replace [options] with the desired command-line options and file with the path to your Data Pump log file.

The script uses a #!/usr/bin/env python3 shebang line so it's also possible to run it directly without the python3 keyword:

Set execute permission with chmod +x dpla.py
Now you can run the script with:

./dpla.py <file> [options]

Supported Logfiles

Data Pump Log Analyzer has been tested across various database versions with Data Pump logfiles generated by:

Data Pump client (expdp/impdp)
Zero Downtime Migration (ZDM)
OCI Database Migration (DMS)
Data Pump API (DBMS_DATAPUMP)

info

To enable Data Pump Log Analyzer to extract all the relevant data it is crucial that the logfiles are created with the METRICS=YES option in Data Pump, as this setting is key for the tool and is also a recommended best practice.

While the script is designed to handle a wide range of logfiles, it's important to note that every logfile can differ based on the data processed. Therefore, there may be cases where the tool does not work as intended with certain logfiles. If the script does not recognize the logfile as valid, it will display an info message to inform you.

Restrictions: Currently Data Pump Log Analyzer only supports logfiles in English. Support for other languages is on the roadmap.

Command Options

Use -h to show all available options:

./dpla.py -h

-h, --help            show this help message and exit
-v, --version         show program's version number and exit
-e [MESSAGE ...], --error [MESSAGE ...]
                      show error details (optionally specify error(s) as a filter
-o, --object          show object type details
-w, --worker          show worker details
-s [SCHEMA ...], --schema [SCHEMA ...]
                      show schema details (optionally specify schema(s) as a filter
-t [TABLE ...], --table [TABLE ...]
                      show table details (optionally specify table(s) as a filter
-i, --instance        show instance details (starting 21c)
-a, --all             show complete output
--sort <column>       specify column name to sort the tables by
--top <N|all>         specify number of top rows to display (use 'all' for no limit)
--output <filename>   specify output file. For HTML output, use .htm or .html extension

Analysis Use Cases

In the following examples, we use the script as follows:

Executed directly with ./dpla and no python3 keyword
Use file.log as Data Pump input log file
Use single dash instead of longer double dash options:
- Example: -w instead of --worker

1. Summary Overview

To view the basic summary overview that shows Logfile Details, Operation Details and Data Processing insights:

./dpla.py file.log

2. Errors/Messages

To view all ORA- messages encountered during the Data Pump operation:

./dpla.py file.log -e

You can also filter for specific messages:

./dpla.py file.log -e ORA-39082 ORA-31684

You don't need to specify the full error code. The filtering also works with substring matching so you can also specify 3168 for example. So It's also possible to search for all messages related to FUNCTION:

./dpla.py file.log -e FUNCTION

3. Object Types

To see details about which types of database objects were involved in the Data Pump operation:

./dpla.py file.log -o

4. Workers

Examine the performance of Data Pump workers to identify bottlenecks and imbalances:

./dpla.py file.log -w 

5. Schemas

To view detailed information about all schemas:

./dpla.py file.log -s 

Use filtering for a detailed look at specific schemas:

./dpla.py file.log -s SCOTT HR

6. Tables

To view detailed information about all tables:

./dpla.py file.log -t 

Use filtering for a detailed look at specific tables:

./dpla.py file.log -t DEPT EMPLOYEES ORDERS

Schema/Filter Mode

When a schema filter is active and -t is specified, the table filter is automatically set to list only tables from the filtered schemas:

./dpla.py file.log -s SCOTT -t

In this example we get detailed information for the SCOTT schema plus all the tables in that schema.

7. Instances

For databases running Oracle 21c or later, you can view instance specific details:

./dpla.py file.log -i

8. Comprehensive

To get a complete comprehensive output:

./dpla.py file.log -a

This displays all available information, combining the output of all other options.

Advanced Features

The filter, sort and limit information is displayed directly under the table header for each section.

Example Output
SCHEMA DETAILS
~~~~~~~~~~~~~~
(sorted by objects, top 20):

In this example we show the top 20 schemas sorted by objects.

Sorting Results

Sort the table details by a specific column:

./dpla.py file.log -t --sort rows

This example sorts tables by the number of rows processed.

tip

You don't need to specify the full column name. Partial names are supported: For example:

ob / obj / ... will match objects
sc / sche / ... will match schema
...

Limiting Output

Display only the top N results:

./dpla.py file.log -t --top 10

This shows details for the top 10 tables processed.

Filter/Sort/Limit

You can combine filter, sort, and limit in one command:

./dpla.py file.log -t emp --sort rows --top 5

Tables are filtered by *emp*, then sorted by rows and output is finally limited to the top 5 tables.

Output Reports

To store the generated output in a file use the --output option. Data Pump Log Analyzer supports two different output formats:

Text Report

Save the output results to a text file: (You can use any extension except .htm and .html)

./dpla.py file.log -a --output dpla_report.txt 

HTML Report

For HTML output, use a .htm or .html file extension:

./dpla.py file.log -a --output dpla_report.html

Monitoring Operations

The script can also be executed during ongoing Data Pump import or export operations. This allows you to gain valuable insights and monitor the progress of the operation in real-time. By running the script while operations are active, you can quickly check the status or identify potential issues.

Default Settings

The default sort and top values can be changed in the script and set per section:

defaults = {
    'oramsg': {'sort': 'count', 'top': None},
    'object': {'sort': 'seconds', 'top': None},
    'worker': {'sort': 'seconds', 'top': None},
    'schema': {'sort': 'seconds', 'top': None},
    'table': {'sort': 'seconds', 'top': 30},
    'instance': {'sort': 'seconds', 'top': None}
}

info

By default the output for the Table Details section is limited to the top 30 rows.

Conclusion

The Data Pump Log Analyzer is a versatile tool that can significantly streamline the process of analyzing Oracle Data Pump operations. By providing detailed insights into errors, object types, worker threads, schemas and tables it enables database professionals to quickly identify issues, optimize performance, and gain a comprehensive understanding of their data movement processes.

Whether you're troubleshooting a failed import, analyzing the performance of a large export, or simply want to gain insights into your regular Data Pump operations, this tool offers the flexibility and depth of analysis needed.

Remember to regularly check for updates and new features, as the tool continues to evolve with Oracle database updates and user needs.

Introduction​

Installation​

Basic Usage​

Supported Logfiles​

Command Options​

Analysis Use Cases​

1. Summary Overview​

2. Errors/Messages​

3. Object Types​

4. Workers​

5. Schemas​

6. Tables​

Schema/Filter Mode​

7. Instances​

8. Comprehensive​

Advanced Features​

Sorting Results​

Limiting Output​

Filter/Sort/Limit​

Output Reports​

Text Report​

HTML Report​

Monitoring Operations​

Default Settings​

Conclusion​