Tải bản đầy đủ - 0 (trang)
2-5. Validate Input Using Regular Expressions

2-5. Validate Input Using Regular Expressions

Tải bản đầy đủ - 0trang

CHAPTER 2 ■ DATA MANIPULATION



Solution

Use regular expressions to ensure that the input data follows the correct structure and contains only

valid characters for the expected type of information.



How It Works

When a user inputs data to your application or your application reads data from a file, it’s good practice

to assume that the data is bad until you have verified its accuracy. One common validation requirement

is to ensure that data entries such as e-mail addresses, telephone numbers, and credit card numbers

follow the pattern and content constraints expected of such data. Obviously, you cannot be sure the

actual data entered is valid until you use it, and you cannot compare it against values that are known to

be correct. However, ensuring the data has the correct structure and content is a good first step to

determining whether the input is accurate. Regular expressions provide an excellent mechanism for

evaluating strings for the presence of patterns, and you can use this to your advantage when validating

input data.

The first thing you must do is figure out the regular expression syntax that will correctly match the

structure and content of the data you are trying to validate. This is by far the most difficult aspect of

using regular expressions. Many resources exist to help you with regular expressions, such as The

Regulator (http://osherove.com/tools), and RegExDesigner.NET, by Chris Sells

(www.sellsbrothers.com/tools/#regexd). The RegExLib.com web site (www.regxlib.com) also provides

hundreds of useful prebuilt expressions.

Regular expressions are constructed from two types of elements: literals and metacharacters.

Literals represent specific characters that appear in the pattern you want to match. Metacharacters

provide support for wildcard matching, ranges, grouping, repetition, conditionals, and other control

mechanisms. Table 2-2 describes some of the more commonly used regular expression metacharacter

elements. (Consult the .NET SDK documentation for a full description of regular expressions. A good

starting point is http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.

regex.aspx.)

Table 2-2. Commonly Used Regular Expression Metacharacter Elements



Element



Description



.



Specifies any character except a newline character (\n)



\d



Specifies any decimal digit



\D



Specifies any nondigit



\s



Specifies any whitespace character



\S



Specifies any non-whitespace character



\w



Specifies any word character



\W



Specifies any nonword character



66



www.it-ebooks.info



CHAPTER 2 ■ DATA MANIPULATION



Element



Description



^



Specifies the beginning of the string or line



\A



Specifies the beginning of the string



$



Specifies the end of the string or line



\z



Specifies the end of the string



|



Matches one of the expressions separated by the vertical bar (pipe symbol); for example,

AAA|ABA|ABB will match one of AAA, ABA, or ABB (the expression is evaluated left to right)



[abc]



Specifies a match with one of the specified characters; for example, [AbC] will match A, b, or C,

but no other characters



[^abc]



Specifies a match with any one character except those specified; for example, [^AbC] will not

match A, b, or C, but will match B, F, and so on



[a-z]



Specifies a match with any one character in the specified range; for example, [A-C] will match

A, B, or C



( )



Identifies a subexpression so that it’s treated as a single element by the regular expression

elements described in this table



?



Specifies one or zero occurrencesof the previous character or subexpression; for example, A?B

matches B and AB, but not AAB



*



Specifies zero or more occurrences of the previous character or subexpression; for example,

A*B matches B, AB, AAB, AAAB, and so on



+



Specifies one or more occurrences of the previous character or subexpression; for example,

A+B matches AB, AAB, AAAB, and so on, but not B



{n}



Specifies exactly n occurrences of the preceding character or subexpression; for example, A{2}

matches only AA



{n,}



Specifies a minimum of n occurrences of the preceding character or subexpression; for

example, A{2,} matches AA, AAA, AAAA, and so on, but not A



{n, m}



Specifies a minimum of n and a maximum of m occurrences of the preceding character; for

example, A{2,4} matches AA, AAA, and AAAA, but not A or AAAAA



The more complex the data you are trying to match, the more complex the regular expression syntax

becomes. For example, ensuring that input contains only numbers or is of a minimum length is trivial,

but ensuring a string contains a valid URL is extremely complex. Table 2-3 shows some examples of

regular expressions that match against commonly required data types.



67



www.it-ebooks.info



CHAPTER 2 ■ DATA MANIPULATION



Table 2-3. Commonly Used Regular Expressions



Input Type



Description



Regular Expression



Numeric input



The input consists of one or more decimal digits; for

example, 5 or 5683874674.



^\d+$



Personal

identification

number (PIN)



The input consists of four decimal digits; for example,

1234.



^\d{4}$



Simple password



The input consists of six to eight characters; for example,

ghtd6f or b8c7hogh.



^\w{6,8}$



Credit card

number



The input consists of data that matches the pattern of

most major credit card numbers; for example,

4921835221552042 or 4921-8352-2155-2042.



^\d{4}-?\d{4}?\d{4}-?\d{4}$



E-mail address



The input consists of an Internet e-mail address. The [\w]+ expression indicates that each address element must

consist of one or more word characters or hyphens; for

example, somebody@adatum.com.



^[\w-]+@([\w]+\.)+[\w-]+$



HTTP or HTTPS

URL



The input consists of an HTTP-based or HTTPS-based

URL; for example, http://www.apress.com.



^https?://([\w]+\.)+ [\w-]+(/[\w./?%=]*)?$



Once you know the correct regular expression syntax, create a new System.Text.

RegularExpressions.Regex object, passing a string containing the regular expression to the Regex

constructor. Then call the IsMatch method of the Regex object and pass the string that you want to

validate. IsMatch returns a bool value indicating whether the Regex object found a match in the string.

The regular expression syntax determines whether the Regex object will match against only the full string

or match against patterns contained within the string. (See the ^, \A, $, and \z entries in Table 2-2.)



The Code

The ValidateInput method shown in the following example tests any input string to see if it matches a

specified regular expression.

using System;

using System.Text.RegularExpressions;

namespace Apress.VisualCSharpRecipes.Chapter02

{

class Recipe02_05

{



68



www.it-ebooks.info



CHAPTER 2 ■ DATA MANIPULATION



public static bool ValidateInput(string regex, string input)

{

// Create a new Regex based on the specified regular expression.

Regex r = new Regex(regex);

// Test if the specified input matches the regular expression.

return r.IsMatch(input);

}

public static void Main(string[] args)

{

// Test the input from the command line. The first argument is the

// regular expression, and the second is the input.

Console.WriteLine("Regular Expression: {0}", args[0]);

Console.WriteLine("Input: {0}", args[1]);

Console.WriteLine("Valid = {0}", ValidateInput(args[0], args[1]));

// Wait to continue.

Console.WriteLine("\nMain method complete. Press Enter");

Console.ReadLine();

}

}

}



Usage

To execute the example, run Recipe02-05.exe and pass the regular expression and data to test as

command-line arguments. For example, to test for a correctly formed e-mail address, type the following:

Recipe02-05 ^[\w-]+@([\w-]+\.)+[\w-]+$ myname@mydomain.com

The result would be as follows:

Regular Expression: ^[\w-]+@([\w-]+\.)+[\w-]+$

Input: myname@mydomain.com

Valid = True



Notes

You can use a Regex object repeatedly to test multiple strings, but you cannot change the regular

expression tested for by a Regex object. You must create a new Regex object to test for a different pattern.

Because the ValidateInput method creates a new Regex instance each time it’s called, you do not get the

ability to reuse the Regex object. As such, a more suitable alternative in this case would be to use a static

overload of the IsMatch method, as shown in the following variant of the ValidateInput method:

// Alternative version of the ValidateInput method that does not create



69



www.it-ebooks.info



CHAPTER 2 ■ DATA MANIPULATION



// Regex instances.

public static bool ValidateInput(string regex, string input)

{

// Test if the specified input matches the regular expression.

return Regex.IsMatch(input, regex);

}



2-6. Use Compiled Regular Expressions

Problem

You need to minimize the impact on application performance that arises from using complex regular

expressions frequently.



Solution

When you instantiate the System.Text.RegularExpressions.Regex object that represents your regular

expression, specify the Compiled option of the System.Text.RegularExpressions.RegexOptions

enumeration to compile the regular expression to Microsoft Intermediate Language (MSIL).



How It Works

By default, when you create a Regex object, the regular expression pattern you specify in the constructor

is compiled to an intermediate form (not MSIL). Each time you use the Regex object, the runtime

interprets the pattern’s intermediate form and applies it to the target string. With complex regular

expressions that are used frequently, this repeated interpretation process can have a detrimental effect

on the performance of your application.

By specifying the RegexOptions.Compiled option when you create a Regex object, you force the .NET

runtime to compile the regular expression to MSIL instead of the interpreted intermediary form. This

MSIL is just-in-time (JIT) compiled by the runtime to native machine code on first execution, just like

regular assembly code. You use a compiled regular expression in the same way as you use any Regex

object; compilation simply results in faster execution.

However, a couple downsides offset the performance benefits provided by compiling regular

expressions. First, the JIT compiler needs to do more work, which will introduce delays during JIT

compilation. This is most noticeable if you create your compiled regular expressions as your application

starts up. Second, the runtime cannot unload a compiled regular expression once you have finished with

it. Unlike as with a normal regular expression, the runtime’s garbage collector will not reclaim the

memory used by the compiled regular expression. The compiled regular expression will remain in

memory until your program terminates or you unload the application domain in which the compiled

regular expression is loaded.

As well as compiling regular expressions in memory, the static Regex.CompileToAssembly method

allows you to create a compiled regular expression and write it to an external assembly. This means that

you can create assemblies containing standard sets of regular expressions, which you can use from

multiple applications. To compile a regular expression and persist it to an assembly, take the following

steps:



70



www.it-ebooks.info



CHAPTER 2 ■ DATA MANIPULATION



1.



Create a System.Text.RegularExpressions.RegexCompilationInfo array large

enough to hold one RegexCompilationInfo object for each of the compiled

regular expressions you want to create.



2.



Create a RegexCompilationInfo object for each of the compiled regular

expressions. Specify values for its properties as arguments to the object

constructor. The following are the most commonly used properties:





IsPublic, a bool value that specifies whether the generated regular

expression class has public visibility







Name, a String value that specifies the class name







Namespace, a String value that specifies the namespace of the class







Pattern, a String value that specifies the pattern that the regular expression

will match (see recipe 2-5 for more details)







Options, a System.Text.RegularExpressions.RegexOptions value that

specifies options for the regular expression



3.



Create a System.Reflection.AssemblyName object. Configure it to represent the

name of the assembly that the Regex.CompileToAssembly method will create.



4.



Execute Regex.CompileToAssembly, passing the RegexCompilationInfo array

and the AssemblyName object.



This process creates an assembly that contains one class declaration for each compiled regular

expression—each class derives from Regex. To use the compiled regular expression contained in the

assembly, instantiate the regular expression you want to use and call its method as if you had simply

created it with the normal Regex constructor. (Remember to add a reference to the assembly when you

compile the code that uses the compiled regular expression classes.)



The Code

This line of code shows how to create a Regex object that is compiled to MSIL instead of the usual

intermediate form:

Regex reg = new Regex(@"[\w-]+@([\w-]+\.)+[\w-]+", RegexOptions.Compiled);

The following example shows how to create an assembly named MyRegEx.dll, which contains two

regular expressions named PinRegex and CreditCardRegex:

using System;

using System.Reflection;

using System.Text.RegularExpressions;

namespace Apress.VisualCSharpRecipes.Chapter02

{

class Recipe02_06

{



71



www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

2-5. Validate Input Using Regular Expressions

Tải bản đầy đủ ngay(0 tr)

×