Skip to main content

Data Pump Log Analyzer: Comprehensive Guide

Introduction​

The Data Pump Log Analyzer is a powerful Python script designed to parse and analyze Oracle Data Pump log files. The tool provides valuable insights into Data Pump operations key metrics and performance data.

Installation​

  • Ensure you have Python installed on your system
  • Minimum required version: Python 3.6
  • Check with the following command:
    python3 --version
  • Download the script from GitHub (or clone the repository) and store it on the server where you want to run it:

Basic Usage​

The basic syntax for running the Data Pump Log Analyzer is:

python3 dpla.py <file> [options]

Replace [options] with the desired command-line options and file with the path to your Data Pump log file.

The script uses a #!/usr/bin/env python3 shebang line so it's also possible to run it directly without the python3 keyword:

  • Set execute permission with chmod +x dpla.py
  • Now you can run the script with:
./dpla.py <file> [options]

Supported Logfiles​

Data Pump Log Analyzer has been tested across various database versions with Data Pump logfiles generated by:

  • Data Pump client (expdp/impdp)
  • Zero Downtime Migration (ZDM)
  • OCI Database Migration (DMS)
  • Data Pump API (DBMS_DATAPUMP)
info

To enable Data Pump Log Analyzer to extract all the relevant data it is crucial that the logfiles are created with the METRICS=YES option in Data Pump, as this setting is key for the tool and is also a recommended best practice.

While the script is designed to handle a wide range of logfiles, it's important to note that every logfile can differ based on the data processed. Therefore, there may be cases where the tool does not work as intended with certain logfiles. If the script does not recognize the logfile as valid, it will display an info message to inform you.

Restrictions: Currently Data Pump Log Analyzer only supports logfiles in English. Support for other languages is on the roadmap.

Command Options​

Use -h to show all available options:

./dpla.py -h
-h, --help            show this help message and exit
-v, --version show program's version number and exit
-e [MESSAGE ...], --error [MESSAGE ...]
show error details (optionally specify error(s) as a filter
-o, --object show object type details
-w, --worker show worker details
-s [SCHEMA ...], --schema [SCHEMA ...]
show schema details (optionally specify schema(s) as a filter
-t [TABLE ...], --table [TABLE ...]
show table details (optionally specify table(s) as a filter
-i, --instance show instance details (starting 21c)
-a, --all show complete output
--sort <column> specify column name to sort the tables by
--top <N|all> specify number of top rows to display (use 'all' for no limit)
--output <filename> specify output file. For HTML output, use .htm or .html extension

Analysis Use Cases​

In the following examples, we use the script as follows:

  • Executed directly with ./dpla and no python3 keyword
  • Use file.log as Data Pump input log file
  • Use single dash instead of longer double dash options:
    • Example: -w instead of --worker

1. Summary Overview​

To view the basic summary overview that shows Logfile Details, Operation Details and Data Processing insights:

./dpla.py file.log

2. Errors/Messages​

To view all ORA- messages encountered during the Data Pump operation:

./dpla.py file.log -e

You can also filter for specific messages:

./dpla.py file.log -e ORA-39082 ORA-31684

You don't need to specify the full error code. The filtering also works with substring matching so you can also specify 3168 for example. So It's also possible to search for all messages related to FUNCTION:

./dpla.py file.log -e FUNCTION

3. Object Types​

To see details about which types of database objects were involved in the Data Pump operation:

./dpla.py file.log -o

4. Workers​

Examine the performance of Data Pump workers to identify bottlenecks and imbalances:

./dpla.py file.log -w 

5. Schemas​

To view detailed information about all schemas:

./dpla.py file.log -s 

Use filtering for a detailed look at specific schemas:

./dpla.py file.log -s SCOTT HR

6. Tables​

To view detailed information about all tables:

./dpla.py file.log -t 

Use filtering for a detailed look at specific tables:

./dpla.py file.log -t DEPT EMPLOYEES ORDERS

Schema/Filter Mode​

When a schema filter is active and -t is specified, the table filter is automatically set to list only tables from the filtered schemas:

./dpla.py file.log -s SCOTT -t

In this example we get detailed information for the SCOTT schema plus all the tables in that schema.

7. Instances​

For databases running Oracle 21c or later, you can view instance specific details:

./dpla.py file.log -i

8. Comprehensive​

To get a complete comprehensive output:

./dpla.py file.log -a

This displays all available information, combining the output of all other options.

Advanced Features​

The filter, sort and limit information is displayed directly under the table header for each section.

Example Output
SCHEMA DETAILS
~~~~~~~~~~~~~~
(sorted by objects, top 20):

In this example we show the top 20 schemas sorted by objects.

Sorting Results​

Sort the table details by a specific column:

./dpla.py file.log -t --sort rows

This example sorts tables by the number of rows processed.

tip

You don't need to specify the full column name. Partial names are supported: For example:

  • ob / obj / ... will match objects
  • sc / sche / ... will match schema
  • ...

Limiting Output​

Display only the top N results:

./dpla.py file.log -t --top 10

This shows details for the top 10 tables processed.

Filter/Sort/Limit​

You can combine filter, sort, and limit in one command:

./dpla.py file.log -t emp --sort rows --top 5

Tables are filtered by *emp*, then sorted by rows and output is finally limited to the top 5 tables.

Output Reports​

To store the generated output in a file use the --output option. Data Pump Log Analyzer supports two different output formats:

Text Report​

Save the output results to a text file: (You can use any extension except .htm and .html)

./dpla.py file.log -a --output dpla_report.txt 

HTML Report​

For HTML output, use a .htm or .html file extension:

./dpla.py file.log -a --output dpla_report.html

Monitoring Operations​

The script can also be executed during ongoing Data Pump import or export operations. This allows you to gain valuable insights and monitor the progress of the operation in real-time. By running the script while operations are active, you can quickly check the status or identify potential issues.

Default Settings​

The default sort and top values can be changed in the script and set per section:

defaults = {
'oramsg': {'sort': 'count', 'top': None},
'object': {'sort': 'seconds', 'top': None},
'worker': {'sort': 'seconds', 'top': None},
'schema': {'sort': 'seconds', 'top': None},
'table': {'sort': 'seconds', 'top': 30},
'instance': {'sort': 'seconds', 'top': None}
}
info

By default the output for the Table Details section is limited to the top 30 rows.

Conclusion​

The Data Pump Log Analyzer is a versatile tool that can significantly streamline the process of analyzing Oracle Data Pump operations. By providing detailed insights into errors, object types, worker threads, schemas and tables it enables database professionals to quickly identify issues, optimize performance, and gain a comprehensive understanding of their data movement processes.

Whether you're troubleshooting a failed import, analyzing the performance of a large export, or simply want to gain insights into your regular Data Pump operations, this tool offers the flexibility and depth of analysis needed.

Remember to regularly check for updates and new features, as the tool continues to evolve with Oracle database updates and user needs.