Data Pump Log Analyzer: Comprehensive Guide
Introductionβ
The Data Pump Log Analyzer is a powerful Python script designed to parse and analyze Oracle Data Pump log files. The tool provides valuable insights into Data Pump operations key metrics and performance data.
Installationβ
- Ensure you have Python installed on your system
- Minimum required version: Python 3.6
- Check with the following command:
python3 --version
- Download the script from GitHub (or clone the repository) and store it on the server where you want to run it:
Basic Usageβ
The basic syntax for running the Data Pump Log Analyzer is:
python3 dpla.py <file> [options]
Replace [options]
with the desired command-line options and file
with the path to your Data Pump log file.
The script uses a #!/usr/bin/env python3
shebang line so it's also possible to run it directly without the python3
keyword:
- Set execute permission with
chmod +x dpla.py
- Now you can run the script with:
./dpla.py <file> [options]
Supported Logfilesβ
Data Pump Log Analyzer has been tested across various database versions with Data Pump logfiles generated by:
- Data Pump client (
expdp
/impdp
) - Zero Downtime Migration (ZDM)
- OCI Database Migration (DMS)
- Data Pump API (
DBMS_DATAPUMP
)
To enable Data Pump Log Analyzer to extract all the relevant data it is crucial that the logfiles are created with the METRICS=YES
option in Data Pump, as this setting is key for the tool and is also a recommended best practice.
While the script is designed to handle a wide range of logfiles, it's important to note that every logfile can differ based on the data processed. Therefore, there may be cases where the tool does not work as intended with certain logfiles. If the script does not recognize the logfile as valid, it will display an info message to inform you.
Restrictions: Currently Data Pump Log Analyzer only supports logfiles in English. Support for other languages is on the roadmap.
Command Optionsβ
Use -h
to show all available options:
./dpla.py -h
-h, --help show this help message and exit
-v, --version show program's version number and exit
-e [MESSAGE ...], --error [MESSAGE ...]
show error details (optionally specify error(s) as a filter
-o, --object show object type details
-w, --worker show worker details
-s [SCHEMA ...], --schema [SCHEMA ...]
show schema details (optionally specify schema(s) as a filter
-t [TABLE ...], --table [TABLE ...]
show table details (optionally specify table(s) as a filter
-i, --instance show instance details (starting 21c)
-a, --all show complete output
--sort <column> specify column name to sort the tables by
--top <N|all> specify number of top rows to display (use 'all' for no limit)
--output <filename> specify output file. For HTML output, use .htm or .html extension
Analysis Use Casesβ
In the following examples, we use the script as follows:
- Executed directly with
./dpla
and nopython3
keyword - Use
file.log
as Data Pump input log file - Use single dash instead of longer double dash options:
- Example:
-w
instead of--worker
- Example:
1. Summary Overviewβ
To view the basic summary overview that shows Logfile Details, Operation Details and Data Processing insights:
./dpla.py file.log
2. Errors/Messagesβ
To view all ORA- messages encountered during the Data Pump operation:
./dpla.py file.log -e
You can also filter for specific messages:
./dpla.py file.log -e ORA-39082 ORA-31684
You don't need to specify the full error code. The filtering also works with substring matching so you can also specify 3168
for example. So It's also possible to search for all messages related to FUNCTION
:
./dpla.py file.log -e FUNCTION
3. Object Typesβ
To see details about which types of database objects were involved in the Data Pump operation:
./dpla.py file.log -o
4. Workersβ
Examine the performance of Data Pump workers to identify bottlenecks and imbalances:
./dpla.py file.log -w
5. Schemasβ
To view detailed information about all schemas:
./dpla.py file.log -s
Use filtering for a detailed look at specific schemas:
./dpla.py file.log -s SCOTT HR
6. Tablesβ
To view detailed information about all tables:
./dpla.py file.log -t
Use filtering for a detailed look at specific tables:
./dpla.py file.log -t DEPT EMPLOYEES ORDERS
Schema/Filter Modeβ
When a schema filter is active and -t
is specified, the table filter is automatically set to list only tables from the filtered schemas:
./dpla.py file.log -s SCOTT -t
In this example we get detailed information for the SCOTT schema plus all the tables in that schema.
7. Instancesβ
For databases running Oracle 21c or later, you can view instance specific details:
./dpla.py file.log -i
8. Comprehensiveβ
To get a complete comprehensive output:
./dpla.py file.log -a
This displays all available information, combining the output of all other options.
Advanced Featuresβ
The filter, sort and limit information is displayed directly under the table header for each section.
SCHEMA DETAILS
~~~~~~~~~~~~~~
(sorted by objects, top 20):
In this example we show the top 20 schemas sorted by objects.
Sorting Resultsβ
Sort the table details by a specific column:
./dpla.py file.log -t --sort rows
This example sorts tables by the number of rows processed.
You don't need to specify the full column name. Partial names are supported: For example:
ob
/obj
/ ... will matchobjects
sc
/sche
/ ... will matchschema
- ...
Limiting Outputβ
Display only the top N results:
./dpla.py file.log -t --top 10
This shows details for the top 10 tables processed.
Filter/Sort/Limitβ
You can combine filter, sort, and limit in one command:
./dpla.py file.log -t emp --sort rows --top 5
Tables are filtered by *emp*
, then sorted by rows and output is finally limited to the top 5 tables.
Output Reportsβ
To store the generated output in a file use the --output
option.
Data Pump Log Analyzer supports two different output formats:
Text Reportβ
Save the output results to a text file: (You can use any extension except .htm and .html)
./dpla.py file.log -a --output dpla_report.txt
HTML Reportβ
For HTML output, use a .htm
or .html
file extension:
./dpla.py file.log -a --output dpla_report.html
Monitoring Operationsβ
The script can also be executed during ongoing Data Pump import or export operations. This allows you to gain valuable insights and monitor the progress of the operation in real-time. By running the script while operations are active, you can quickly check the status or identify potential issues.
Default Settingsβ
The default sort and top values can be changed in the script and set per section:
defaults = {
'oramsg': {'sort': 'count', 'top': None},
'object': {'sort': 'seconds', 'top': None},
'worker': {'sort': 'seconds', 'top': None},
'schema': {'sort': 'seconds', 'top': None},
'table': {'sort': 'seconds', 'top': 30},
'instance': {'sort': 'seconds', 'top': None}
}
By default the output for the Table Details section is limited to the top 30 rows.
Conclusionβ
The Data Pump Log Analyzer is a versatile tool that can significantly streamline the process of analyzing Oracle Data Pump operations. By providing detailed insights into errors, object types, worker threads, schemas and tables it enables database professionals to quickly identify issues, optimize performance, and gain a comprehensive understanding of their data movement processes.
Whether you're troubleshooting a failed import, analyzing the performance of a large export, or simply want to gain insights into your regular Data Pump operations, this tool offers the flexibility and depth of analysis needed.
Remember to regularly check for updates and new features, as the tool continues to evolve with Oracle database updates and user needs.