Tải bản đầy đủ
1 PLVprs: Useful String Parsing Extensions

1 PLVprs: Useful String Parsing Extensions

Tải bản đầy đủ

[Appendix A] Appendix: PL/SQL Exercises
Return the n th atomic in a string

Wrap a long line of text into a paragraph

Display a wrapped line of text
You can use PLVprs within other packages and programs to analyze strings and display their contents. It is
also used within PL/Vision by PLVvu and PLVdyn to display long messages.
All of our applications require manipulation of textual information. I have often encountered the need to parse
and analyze those strings in order to answer requests like the following (and I am sure that you could add
more to the list):

Count the number of words in a string.

Separate out all words and punctuation in a string into separate components.

Return the nth value in a semicolon−delimited string. This is a very common situation in Oracle
Forms applications, in which a developer might pack a set of values into a global variable like this:
"123;5555;6623.11;".

10.1.1 Developing a General Solution
Taken separately, it is not too hard to develop a solution to any of the items on this list. If you build solutions
to each individual requirement on a case−by−case basis, you will end up with an enormous volume of
redundant code which you cannot easily enhance or upgrade.
Rather than construct a smattering of different, specialized routines to analyze strings, I offer with PLVprs a
set of very generic and flexible functions and procedures. Yet the only way to make sure that my approach
will handle many different situations and requirements is to base that approach on a general analysis of the
components of a string.
In the world of PLVprs, a string is made up of a series of atomics, the smallest indivisible elements of the
string. An atomic is either a word (defined as a contiguous set of letters and/or numbers) or a delimiter (any
one of a set of special characters which separate or delimit words). In the English language, for example,
common delimiters, a.k.a. punctuation, include the comma, semicolon, question mark, period, etc. But in a
more general scenario, we have to be careful not to assume that the set of delimiters is a constant. For
example, if I want to perform operations on a string of values delimited by a semicolon, then the set of
delimiters for that particular operation is a single character: the semicolon.
Once you see strings as a set of atomics, of words separated by delimiters, you can reinterpret and enhance the
kinds of requests listed above, such as "count all words in a string." Do you want to count all the atomics or
just the words or just the delimiters? PLVprs gives you that level of flexibility.

10.1.2 Customizing the Delimiter Set
The PLVprs package interprets a string as a series of atomics: words and delimiters. But what is a delimiter?
PLVprs predefines two sets of delimiters:

10.1.1 Developing a General Solution

305

[Appendix A] Appendix: PL/SQL Exercises
Name

Characters in Delimeter Set

std_delimeters

!@#$%^&*()_=+\|`~{{]};:''",<.>/?' plus space, tab, and newline characters

plsql_delimeters !@%^&*()=+\|`~{{]};:''",<.>/?' plus space, tab, and newline characters
The only difference between these lists is that the plsql_delimiters set omits the underscore, dollar
sign, and pound sign characters. These characters are valid symbols in a PL/SQL identifier, which should
certainly be considered a word.
Would you ever need any other delimiter sets? Sure, why not? You might have a string which is packed with
values separated by the vertical bar. In this situation, when you call PLVprs.string to separate the string
into separate atomics, you will want to be able to specify your special and very short delimiter list.
PLVprs lets you specify a non−default delimiter list in the following programs:
next_atom_loc
nth_atomic
display_atomics
string
numatomics
Here is an example of how I could parse a string with atomics packed between vertical bars and ignore any
other delimiters:
DECLARE
atomic PLVtab.vc2000_table;
num INTEGER;
BEGIN
PLVprs.string ('A#−%|12345|(*&*)|0101R|', atomic, num, '|');
PLVtab.display (atomic,num);
END;
/

This is the output seen after executing the above script:
Contents of Table
A#−%
|
12345
|
(*&*)
|
0101R
|

With the default set of delimiters, this string would have been broken up by PLVprs.string into fifteen
separate atomics, rather than just eight. So don't forget to use your own delimiter list as needed to simplify
your parsing and analysis jobs.

10.1.3 Parsing Strings into Atomics
The PLVprs package offers a number of programs that perform different parsing operations on strings. They
are each discussed below. The implementations of most of these programs were discussed in Oracle PL/SQL
Programming, so I will not go beyond an explanation here of how you can use these functions.

10.1.3 Parsing Strings into Atomics

306

[Appendix A] Appendix: PL/SQL Exercises
10.1.3.1 next_atom_loc function
The next_atom_loc returns the location in the specified string of the next atomic. It is used by other
PLVprs programs, but is also available for use by other PL/Vision packages −− and by you. Its header is:
FUNCTION next_atom_loc
(string_in IN VARCHAR2,
start_loc_in IN NUMBER,
direction_in IN NUMBER := +1,
delimiters_in IN VARCHAR2 := std_delimiters)
RETURN INTEGER;

The string_in parameter is the string to be scanned. The start_loc_in parameter provides the starting
position of the search for the start of the next atomic. The direction_in parameter indicates whether the
search should move forward or backward through the string. The final argument allows you to specify the set
of characters to be considered delimiters (which indicate the start of a new atomic).
The next_atom_loc function returns the location in the string of the starting point of the next atomic
(from the start location). The function scans forward if direction_in is +1, otherwise it scans backwards
through the string. Here is the logic to determine when the next atomic starts:
1.
If the current atomic is a delimiter (that is, if the character at the start_loc_in of the string is a
delimiter), then the next character starts the next atomic since all delimiters are a single character in
length.
2.
If the current atomic is a word (that is, if the character at the start_loc_in of the string is a letter
or number), then the next atomic starts at the next delimiter. Any letters or numbers in between are
part of the current atomic.
The next_atomic_loc function loops through the string one character at a time and applies these tests. It
also has to check for the end of string. If it scans forward, the end of string comes when the SUBSTR that
pulls out the next character returns NULL. If it scans backward, then the end of the string comes when the
location is less than 0.
10.1.3.2 display_atomics procedure
The display_atomics procedure displays the atomics found in the specified string. Its header is shown
below:
PROCEDURE display_atomics
(string_in IN VARCHAR2,
delimiters_in IN VARCHAR2 := std_delimiters);

You specify the string you want parsed and the set of characters you want to be treated as delimiters for the
parsing. To make it easier to view blank lines, spaces are displayed as a description of the number of spaces
present (as in "1 blank" or "6 blanks"). This feature is shown below:
SQL> exec PLVtab.showrow
SQL> exec PLVprs.display_atomics ('Compassion is a human thing.');
Parsed Atomics in String
Row 1 = Compassion
Row 2 = 1 blank
Row 3 = is
Row 4 = 1 blank
Row 5 = a
Row 6 = 1 blank

10.1.3 Parsing Strings into Atomics

307

[Appendix A] Appendix: PL/SQL Exercises
Row
Row
Row
Row

7 = human
8 = 2 blanks
9 = thing
10 = .

You can specify an alternate delimiter set in order to display the contents of a string according to the format of
that string and how it is used. Compare the two calls to display_atomics below:
SQL> exec PLVprs.display_atomics ('1234|A$%|67YYY|(big)');
11
Parsed Atomics in String
1234
−|−
A
−$−
−%−
−|−
67YYY
−|−
−(−
big
−)−
SQL> exec PLVprs.display_atomics ('1234|A$%|67YYY|(big)', '|');
7
Parsed Atomics in String
1234
−|−
A$%
−|−
67YYY
−|−
(big)

In the second call, I am telling display_atomics to consider "|" as the sole delimiter in the string.
10.1.3.3 numatomics function
The numatomics function returns the number of atomics in a string. You can calculate the number of all
atomics, only the words, or only the delimiters. You can specify in your call to the function what characters
should be considered delimiters. The header for this function is:
FUNCTION numatomics
(string_in IN VARCHAR2,
count_type_in IN VARCHAR2 := c_all,
delimiters_in IN VARCHAR2 := std_delimiters)
RETURN INTEGER;

The following examples demonstrate how to use the count_type_in argument. The first call relies on the
default value of "all atomics." The next two calls pass in a specific request for type of atomic, relying on the
constants provided in the package specification.
SQL> exec p.l (PLVprs.numatomics ('this, is%not.'))
7
SQL> exec p.l (PLVprs.numatomics ('this, is%not.', PLVprs.c_word))
3
SQL> exec p.l (PLVprs.numatomics ('this, is%not.', PLVprs.c_delim))
4

10.1.3.4 nth_atomic function
The nth_atomic function returns the n th atomic in a string. You can, for example, ask for the seventh
word or the sixth delimiter. The header for nth_atomic is:
10.1.3 Parsing Strings into Atomics

308

[Appendix A] Appendix: PL/SQL Exercises
FUNCTION nth_atomic
(string_in IN VARCHAR2,
nth_in IN NUMBER,
count_type_in IN VARCHAR2 := c_all,
delimiters_in IN VARCHAR2 := std_delimiters)
RETURN VARCHAR2;

The nth_atomic function is very flexible, following the model of the builtin functions, INSTR and
SUBSTR. You can scan both forward and backward through the string. If you provide a positive nth_in
argument, then nth_atomic scans forward to the nth atomic. If, on the other hand, the nth_in value is
negative, nth_atomic will scan backwards through the string to the nth atomic. This feature is shown in the
examples below:
SQL>
,
SQL>
is
SQL>
%
SQL>
this

exec p.l(PLVprs.nth_atomic ('this, is%not.', 2))
exec p.l(PLVprs.nth_atomic ('this, is%not.', 2, PLVprs.c_word))
exec p.l(PLVprs.nth_atomic ('this, is%not.', 3, PLVprs.c_delim))
exec p.l(PLVprs.nth_atomic ('this, is%not.', −3, PLVprs.c_word))

The PLVdyn package also utilizes nth_atomic function to extract the type of program unit and the name of
the program unit that is being compiled and stored by the compile procedure:
/* Get the first word.*/
v_name1 := PLVprs.nth_atomic (stg_in, 1, PLVprs.c_word);
/* Get the second word. */
v_name2 := PLVprs.nth_atomic (stg_in, 2, PLVprs.c_word);
/* If a package body, then get the THIRD word. */
IF UPPER (v_name1||' '||v_name2) = 'PACKAGE BODY'
THEN
v_name1 := v_name1 || ' ' || v_name2;
v_name2 := PLVprs.nth_atomic (stg_in, 3, PLVprs.c_word);
END IF;

10.1.3.5 string procedures
There are two, overloaded versions of the string procedure, which parses a string into its separate atomics.
The string procedure is available to users of the PLVprs package; it is also used by programs in the
PLVprs package. The display_atomics procedure, for example, calls the string procedure to parse the
specified string and then calls PLVtab.display to display the table that contains the parsed atomics.
The headers for the string procedures are:
PROCEDURE string
(string_in IN VARCHAR2,
atomics_list_out OUT PLVtab.vc2000_table,
num_atomics_out IN OUT NUMBER,
delimiters_in IN VARCHAR2 := std_delimiters);
PROCEDURE string
(string_in IN VARCHAR2,
atomics_list_out IN OUT VARCHAR2,
num_atomics_out IN OUT NUMBER,
delimiters_in IN VARCHAR2 := std_delimiters);

The first argument is the string to be parsed. The second and third arguments contain the parsed output of the
procedure call. The fourth argument allows you to specify an alternative set of delimiter characters.

10.1.3 Parsing Strings into Atomics

309

[Appendix A] Appendix: PL/SQL Exercises
The table version of string fills a PL/SQL table with the atomics of the string, one atomic per row. The
VARCHAR2 version of the string procedure returns the atomics in a string separated by the vertical bar
delimiter. This "string version" of string simply calls the table version and then dumps the contents of the
table into a string.
10.1.3.6 numinstr function
The numinstr function is a good example of how you can supplement the fine, but finite set of builtin string
functions with your own basic and quite reusable functions. The numinstr function returns the number of
times a substring appears in a string. Its header is:
FUNCTION numinstr
(string_in IN VARCHAR2,
substring_in IN VARCHAR2,
ignore_case_in IN VARCHAR2 := c_ignore_case)
RETURN INTEGER;

The first argument is the string, the second is the substring, and the third allows you to specify whether you
want case to be ignored or respected in the search. The following examples illustrate this flexibility:
SQL> exec p.l (PLVprs.numinstr ('abcabC', 'c'));
2
SQL> exec p.l (PLVprs.numinstr ('abcabC', 'c', PLVprs.c_respect_case));
1

In the following code fragment, I call numinstr to determine the number of placeholders for bind variables
in a dynamically constructed SQL string (a placeholder is defined as a colon followed by a PL/SQL identifier,
such as :newname or :bindvar1). I then use a numeric FOR loop to issue a call to the
BIND_VARIABLE procedure of the builtin DBMS_SQL package for each of the bind variables.
numvars := PLVprs.numinstr (sql_string, ':');
FOR bindvar_ind IN 1 .. numvars
LOOP
DBMS_SQL.BIND_VARIABLE
(cur_handle,
'bindvar' || TO_CHAR (bindvar_ind),
variables_table (bindvar_in));
END LOOP;

Finding the Best Solution
I believe I am a fairly good PL/SQL programmer. I can churn out hundreds of lines of complicated code that
works, more or less, after just a few rounds of revisions. On the other hand, I feel I am also very much open to
the possibility that others can and have done better −− and that I can learn from them. None of us has all the
answers −− and some of us have more answers than others.
Let me give you an example. In my first book on PL/SQL, Oracle PL/SQL Programming (O'Reilly &
Associates), I give every impression that I know what I am talking about −− and a big part of what I talk
about is writing concise, high−quality code. In Chapter 11, Character Functions, I take my readers through
the exercise of building a function to count the number of times a substring occurs in a string (a function not
provided by PL/SQL). I end up with two implementations, the shorter of which is shown in Example 10.1.
Example 10.1: Counting the Frequency of a Substring Within a String
FUNCTION numinstr
(string_in IN VARCHAR2, substring_in IN VARCHAR2)
RETURN INTEGER
IS
substring_loc NUMBER;

10.1.3 Parsing Strings into Atomics

310

[Appendix A] Appendix: PL/SQL Exercises
return_value NUMBER := 1;
BEGIN
LOOP
substring_loc :=
INSTR (UPPER (string_in),
UPPER (substring_in), 1, return_value);
/* Terminate loop when no more occurrences are found. */
EXIT WHEN substring_loc = 0;
/* Found match, so add to total and continue. */
return_value := return_value + 1;
END LOOP;
RETURN return_value − 1;
END numinstr;

I was quite content with this function −− until I received an email from Kevin Loney, author of ORACLE
DBA Handbook (Oracle Press). This email very politely complimented me on my book and offered an
alternative implementation for numinstr −− a simpler, more efficient, and more elegant implementation.
As Kevin noted, he came up with this solution before the days of PL/SQL, when all he had to work with was
SQL (a set−at−a−time, nonprocedural language). In his approach (shown in Example 10.2), he took advantage
of the REPLACE builtin function to substitute NULL for any occurrences of the substring. He could then
compare the size of the original string with the "replaced" string and use that as the basis for his calculation.
Example 10.2: Counting the Frequency of a Substring Within a String
FUNCTION numinstr
(string_in IN VARCHAR2, sub_string_in IN VARCHAR2)
RETURN INTEGER
/* Divide difference of two lengths by length of substring. */
IS
BEGIN
RETURN
((LENGTH (string_in) −
NVL (LENGTH (REPLACE (string_in, sub_string_in)), 0))
/ LENGTH (sub_string_in));
END;

I felt a wave of embarrassment wash over me for just a moment when I first read Kevin's note. Then I
regained sanity. Of course there are better ways of doing things. Discovering and sharing these
improvements −− programming altruism −− is one of the finest aspects of our work. And programming
humility −− the willingness to accept these improvements −− can make our lives much easier. If you are
open to the ideas of others, then you are also open to the idea of using the work of others (with permission!).
This means that you will spend much less time coding something that is already available −− you will avoid
reinventing the wheel. Kevin's implentation for numinstr (enhanced to ignore or respect case) is now in
version I provide in PLVprs.

10.1.4 Wrapping Strings into Paragraphs
How many times have you been confronted with the need to display a long string (defined, essentially, as
anything longer than 80 characters) in a format which can be read easily? Unfortunately, PL/SQL does not
have any paragraph−wrapping capabilities built into it. All you have is DBMS_OUTPUT.PUT_LINE, which
accepts a maximum of 255 characters and is displayed in whatever manner handled by your environment.
PLVprs fills this gap in functionality by providing the following suite of string−wrapping programs:
wrap
Wraps a long string into a series of lines with a maximum specified length, each line of which is
stored in consecutive rows in a PL/SQL table.
10.1.3 Parsing Strings into Atomics

311

[Appendix A] Appendix: PL/SQL Exercises
wrapped_string
Returns a long string wrapped into a series of lines separated by newline characters.
display_wrap
Displays a long string in paragraph−wrapped form at the specified length.
The wrap procedure supplies the core paragraph−wrapping functionality and is called both by
wrapped_string and display_wrap. All three programs are described below.
10.1.4.1 wrap procedure
The header for the wrap procedure is:
PROCEDURE wrap
(text_in IN VARCHAR2,
line_length IN INTEGER,
paragraph_out IN OUT PLVtab.vc2000_table,
num_lines_out IN OUT INTEGER);

The first parameter is the text to be wrapped. The second parameter, line_length, is the length of the line
into which the long text is to be wrapped. The paragraph_out argument is the PL/SQL table which will
receive the wrapped text (each row, starting from 1, contains a single line of text). The num_lines_out
argument contains the number of lines of text that have been placed in paragraph_out.
Examples of PLVprs.wrap are shown in the following sections.
10.1.4.2 display_wrap procedure
The display_wrap procedure comes in handy when you only want to display the end result −− the
wrapped string. If you have no need to store that string in a PL/SQL table or a single string variable for further
manipulation, call display_wrap. Then you will not have to declare temporary data structures to hold the
wrapped text until it is displayed.
PROCEDURE display_wrap
(text_in IN VARCHAR2,
line_length IN INTEGER := 80,
prefix_in IN VARCHAR2 := NULL);

The first parameter, text_in, is the string to be wrapped and then displayed. The second argument,
line_length, is the length of the line within which the string is wrapped. The third argument,
prefix_in, allows you to specify an optional prefix to display before each line of the wrapped string.
Here is an example of display_wrap:
SQL> exec PLVprs.display_wrap (RPAD ('a string',120,' to remember'),30);
a string to remember to
remember to remember to
remember to remember to
remember to remember to
remember to remember to

The implementation of display_wrap is short and sweet. It calls the wrap procedure to fill up a locally
declared PL/SQL table with the wrapped lines of text. It then calls PLVtab.display to display the
contents of that table.
The p package of PL/Vision makes use of the display_wrap procedure to automatically wrap long lines of
text. As a result, when you call p.l, you don't have to worry about the DBMS_OUTPUT restriction of 255
characters displayed in a string. Instead, it will automatically check the length of the string and wrap the
10.1.4 Wrapping Strings into Paragraphs

312

[Appendix A] Appendix: PL/SQL Exercises
output as shown below:
ELSIF LENGTH (line_in) > 80
THEN
PLVprs.display_wrap (line_in, 75, NULL);

Now why did I include a third argument of NULL in my call to display_wrap? That (lack of a) value is,
after all, the default. I could simply have called display_wrap as follows:
PLVprs.display_wrap (line_in, 75);

and received the same results.
Usually I am a strong advocate of typing only the absolute minimum necessary to get the job done. Have I
violated my own guidelines? Not really. I have, in fact, typed exactly what I needed to get the job done. The
question you should now ask is: What is my "job" or requirement? I want to display a wrapped string without
any prefix. The way I do that is to pass NULL for the prefix; hence, the inclusion of three arguments.
Notice that I did not state my requirement as follows: "I want to display a wrapped string with the default
prefix." If that were my desire, I should pass just two arguments. That way, if the default ever changes, my
call to display_wrap will automatically adapt to the new value. But in the p.l package I would not want
the output from PLVprs.display_wrap to change when the default value changes. I really do want a
NULL value, regardless of the actual default value for the display_wrap procedure. In p.l's use of
display_wrap, in other words, the fact that the desired prefix (none) is the same as the default is nothing
more than a coincidence.
This distinction between needing the default value and happening to match the default value is an important
one. If you simply have a coincidence, do not rely on the default value. Instead, pass in the value, even if it is
currently the default and not, strictly speaking, needed. Your code will be less likely to break as the
underlying layers of code you rely on change.
10.1.4.3 wrapped_string function
The last string−wrapping program is the wrapped_string function. This function returns a string with the
wrapped lines of text concatenated together, with newline characters after every wrapped line. The
specification for wrapped_string is as follows:
FUNCTION wrapped_string
(text_in IN VARCHAR2,
line_length IN INTEGER := 80,
prefix_in IN VARCHAR2 := NULL)
RETURN VARCHAR2;

The first argument, text_in, is the string to be wrapped and then displayed. The second argument,
line_length, is the length of the line within which the string is wrapped. The third argument,
prefix_in, allows you to specify an optional prefix to display before each line of the wrapped string.
This function is useful in situations (such as that currently found in Oracle Developer/2000 Version 1) where
you cannot reference and use PL/SQL tables. The wrapped_string returns the text in a string which can
then be immediately displayed in the right format (because of the embedded newline characters) or passed to
another program which works with the wrapped text.
Special Notes on PLVprs
Here are some factors to consider when working with PLVprs:

10.1.4 Wrapping Strings into Paragraphs

313

[Appendix A] Appendix: PL/SQL Exercises
PLVprs does not currently recognize many nonprinting characters as delimiters (only newline and
tab are in the predefined delimiter lists). If you have, for example, a backspace character in your
string, this will not be treated as a delimiter. You can, of course, define your own delimiter list and
then pass that set of characters to PLVprs's program.

When you wrap a string, all carriage returns are replaced with single characters. Your "hard−coded"
line breaks will be ignored in the wrap process.

9.6 Implementing load_
from_dbms

10.2 PLVtkn: Managing
PL/SQL Tokens

Copyright (c) 2000 O'Reilly Associates. All rights reserved.

10.1.4 Wrapping Strings into Paragraphs

314

Chapter 10
PLVprs, PLVtkn, and
PLVprsps: Parsing Strings

10.2 PLVtkn: Managing PL/SQL Tokens
The PLVtkn (PL/Vision ToKeN) package determines whether an identifier is a PL/SQL keyword. It does this
by using a token table containing these keywords.
Every PL/SQL program is filled with identifiers. Identifiers are named PL/SQL language elements and
include variable names, program names, and reserved words. Reserved words play a very different role in our
programs than do the application−specific identifiers. I recommend strongly in Oracle PL/SQL Programming
that you reflect these different roles in your program by using the UPPER−lower method: all reserved words
are typed in UPPER case and all application−specific identifiers are typed in lower case. I even go so far, in
PL/Vision, as to provide you with a package (PLVcase) which will automatically convert your programs to
the UPPER−lower method.

10.2.1 Keeping Track of PL/SQL Keywords
Well, if PLVcase is going to uppercase only keywords, it has to know which identifiers in a PL/SQL
program are the reserved words. This information is maintained in the PLV_token table, which has the
following structure:
Name
Null?
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− −−−−−−−−
TOKEN
NOT NULL
TOKEN_TYPE

Type
−−−−
VARCHAR2(100)
VARCHAR2(10)

where the token column is the identifier and token_type indicates the type.
The different token types in the PLV_token table are stored in the PLV_token_type table, which has this
structure:
Name
Null?
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− −−−−−−−−
TOKEN_TYPE
NAME
NOT NULL

Type
−−−−
VARCHAR2(10)
VARCHAR2(100)

The contents of the PLV_token_type are explained in the following table:
Token Type Name

Description

B

BUILT−IN

Builtin functions and procedures of the PL/SQL language, including
packaged builtins.

D

DATATYPE

Different datatypes of the PL/SQL language, such as INTEGER and
VARCHAR2.

DD

DATA−DICTIONARY Views and tables from the Oracle Server data dictionary, such as
ALL_SOURCE and DUAL.

E

EXCEPTION

Predefined system exceptions such as ZERO_DIVIDE.
315