The JavaTM Tutorial
Previous Page Lesson Contents Next Page Start of Tutorial > Start of Trail > Start of Lesson Search
Feedback Form

Trail: Bonus
Lesson: Regular Expressions

Methods of the Pattern Class

Until now, we've only used the test harness to create Pattern objects in their most basic form. This section explores advanced techniques such as creating patterns with flags and using embedded flag expressions. It also explores some remaining methods we haven't discussed yet.

Creating a Pattern with Flags

The Pattern class defines an alternate compile method that accepts a set of flags affecting the way the pattern is matched. The flags parameter is a bit mask that may include any of the following public static fields: In the following steps we will modify the test harness, RegexTestHarness (in a .java source file) to create a pattern with case-insensitive matching.

First, modify the code to call the alternate version of compile:

pattern = Pattern.compile(REGEX,Pattern.CASE_INSENSITIVE);
Then edit your input file, regex.txt, to contain the following:
 
dog
DoGDOg
Finally, compile and run the test harness to get the following results:
 
Current REGEX is: dog
Current INPUT is: DoGDOg
I found the text "DoG" starting at index 0 and ending at index 3.
I found the text "DOg" starting at index 3 and ending at index 6.
As you can see, the string literal "dog" matches both occurances, regardless of case. To compile a pattern with multiple flags, separate the flags to be included using the bitwise OR operator (|):
 
pattern = Pattern.compile("[az]$", Pattern.MULTILINE | Pattern.UNIX_LINES);
Note that for clarity you could also specify an int variable:
 
final int flags = Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE;
Pattern pattern = Pattern.compile("aa", flags);

Embedded Flag Expressions

It's also possible to enable various flags using embedded flag expressions. Embedded flag expressions are an alternative to the two-argument version of compile, and are specified in the regular expression itself. The following example uses the original test harness, RegexTestHarness.java (in a .java source file) with the embedded flag expression (?i) to enable case-insensitive matching.
 
Current REGEX is: (?i)foo
Current INPUT is: FOOfooFoOfoO
I found the text "FOO" starting at index 0 and ending at index 3.
I found the text "foo" starting at index 3 and ending at index 6.
I found the text "FoO" starting at index 6 and ending at index 9.
I found the text "foO" starting at index 9 and ending at index 12.
Once again, all matches succeed regardless of case.

The embedded flag expressions that correspond to Pattern's publicly-accessible fields are presented in the following table:

 Constant  Equivalent Embedded Flag Expression
 Pattern.CANON_EQ   None
 Pattern.CASE_INSENSITIVE  (?i)
 Pattern.COMMENTS  (?x)
 Pattern.MULTILINE  (?m)
 Pattern.DOATALL  (?s)
 Pattern.UNICODE_CASE  (?u)
 Pattern.UNIX_LINES  (?d)

Using the matches(String,CharSequence) Method

The Pattern class defines a convenient matches (in the API reference documentation) method that allows you to quickly check if a pattern is present in a given input string. As with all public static methods, you should call matches with its class name, such as Pattern.matches("\\d","1"); In this example, the method returns true, because the digit "1" matches the regular expression \d.

Using the split(String) method

The split (in the API reference documentation) method is a great tool for gathering the text that lies on either side of the pattern that's been matched. As shown below in the SplitTest (in a .java source file) code, the split method could extract the words "one two three four five" from the string "one:two:three:four:five":
 
import java.util.regex.*;

public final class SplitTest {

    private static String REGEX = ":";
    private static String INPUT = "one:two:three:four:five";
    
    public static void main(String[] argv) {
        Pattern p = Pattern.compile(REGEX);
        String[] items = p.split(INPUT);
        for(int i=0;i<items.length;i++) {
            System.out.println(items[i]);
        }
    }
}

OUTPUT:

one
two
three
four
five
For simplicity, we've matched a string literal, the colon (:) instead of a complex regular expression. Since we're still using Pattern and Matcher objects, you can use split to get the text that falls on either side of any regular expression. Here's the same example, SplitTest2 (in a .java source file), modified to split on digits instead:
 
import java.util.regex.*;

public final class SplitTest2 {

    private static String REGEX = "\\d";
    private static String INPUT = "one9two4three7four1five";

    public static void main(String[] argv) {
        Pattern p = Pattern.compile(REGEX);
        String[] items = p.split(INPUT);
        for(int i=0;i<items.length;i++) {
            System.out.println(items[i]);
        }
    }
}

OUTPUT:

one
two
three
four
five

Pattern Method Equivalents in java.lang.String

Regular expression support has also been introduced to java.lang.String through several methods that mimic the behavior of java.util.regex.Pattern. For convenience, key excerpts from their API are presented below.


Previous Page Lesson Contents Next Page Start of Tutorial > Start of Trail > Start of Lesson Search
Feedback Form

Copyright 1995-2004 Sun Microsystems, Inc. All rights reserved.