Java- Regular Expressions tutorials

Abhishek Srivastava
3 min readJun 20, 2021

A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. They can be used to search, edit, or manipulate text and data.

Here we will learn about the regular expressions by using the program to remove duplicate words from the string.

So now first write the program and then will describe each line of code.

package com.farenda.java.util.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {

public static void main(String[] args) {
String input = "We we love love Android";

//1 String regex = "\\b(\\w+)(\\s+\\1\\b)+";

// Use compile(regex) if you want case sensitive.
//2 Pattern p = Pattern.compile(regex,Pattern.CASE_INSENSITIVE);

//3 Matcher m = p.matcher(input);
//4 while (m.find()) {
//5 input = input.replaceAll(m.group(), m.group(1));
}

System.out.println(input);
}
}

The above code produces the following output:

We love Android

Now we try to understand each line of code with its uses. Point-1 is used to write a regex to check duplicate words from a string.

String regex = "\\b(\\w+)(\\s+\\1\\b)+";

What all that means:

  1. \b: look for a word boundary (match only beginning of a word instead of somewhere in the middle);
  2. (\w+): match one or more word characters and remember them as a group (the parens) to which later we can refer to using a number; so this matches a complete word and remembers it;
  3. \s+: match one or more space characters;
  4. \1: match the word remembered in step 2;
  5. \b: like in step 1 — make sure it’s not a part of some longer word;
  6. (\s+\1\b)+: match one or more occurrences of the word captured in step 2.

Now let’s discuss point 2

Pattern p = Pattern.compile(regex,Pattern.CASE_INSENSITIVE);
  • Pattern Class − A Pattern object is a compiled representation of a regular expression. The Pattern class provides no public constructors. To create a pattern, you must first invoke one of its public static compile() methods, which will then return a Pattern object. These methods accept a regular expression as the first argument.

If you want to match words in a case insensitive way then just compile the above Regular Expression with the CASE_INSENSITIVE flag.

let’s discuss point 3

Matcher m = p.matcher(input);
  • Matcher Class − A Matcher object is the engine that interprets the pattern and performs match operations against an input string. Like the Pattern class, Matcher defines no public constructors. You obtain a Matcher object by invoking the matcher() method on a Pattern object.

Now finally discuss points 4 & 5

while (m.find()) {
input = input.replaceAll(m.group(), m.group(1));
}

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters “d”, “o”, and “g”.

Capturing groups are numbered by counting their opening parentheses from the left to the right. In the expression ((A)(B(C))), for example, there are four such groups −

  • ((A)(B(C)))
  • (A)
  • (B(C))
  • (C)

To find out how many groups are present in the expression, call the groupCount method on a matcher object. The groupCount method returns an int showing the number of capturing groups present in the matcher’s pattern.

There is also a special group, group 0, which always represents the entire expression. This group is not included in the total reported by groupCount.

  • It matches every occurrence of the Regular Expression defined above and replaces the whole matched string/pattern (here m.group()) with the content of the first remembered group (m.group(1)), which is our single word.
  • When applied on the input string, m.group() and m.group(1) will have the following values in subsequent iterations of the while loop:
  • m.group(): “We we and m.group(1): ‘We’
  • m.group(): “love love” and m.group(1): “love”.

Thanks for reading…

--

--

Abhishek Srivastava

Senior Software Engineer | Android | Java | Kotlin | Xamarin Native Android | Flutter | Go