DarkMatter in Cyberspace
  • Home
  • Categories
  • Tags
  • Archives

能够进行多段文本匹配的NFA改良算法


下面的代码基于NFA算法实现了在多段字符串中匹配正则表达式, 对比NFA算法可以看到它将pc由局部变量提升为类成员,以保存中间匹配状态, 另外在匹配成功后将pc恢复到null状态。实际使用中, 此类还应该增加一个"reset"方法,将pc值置为null,以便用户可以显式要求重新开始匹配。

public class MultiSegmentNFA {
  private final Digraph G; // digraph of epsilon transitions
  private final String regexp; // regular expression
  private final int M; // number of characters in regular expression
  private Bag<Integer> pc = null;
  public MultiSegmentNFA(String regexp) {
   ... // same as NFA
  }
  public boolean recognizes(String target) {
    if (pc == null) {
      DirectedDFS dfs = new DirectedDFS(G, 0);
      pc = new Bag<Integer>();
      for (int v = 0; v < G.V(); v++)
        if (dfs.marked(v)) pc.add(v);
    }
    for (int i = 0; i < target.length(); i++) {
      Bag<Integer> match = new Bag<Integer>();
      for (int v : pc) {
       if (v == M)
        continue;
       if ((regexp.charAt(v) == target.charAt(i))
         || regexp.charAt(v) == '.')
        match.add(v + 1);
      }
      DirectedDFS dfs = new DirectedDFS(G, match);
      pc = new Bag<Integer>();
      for (int v = 0; v < G.V(); v++)
        if (dfs.marked(v)) pc.add(v);
      if (pc.size() == 0) return false;
    }
    for (int v : pc)
      if (v == M) {
        pc = null;
        return true;
      }
    return false;
  }

  public static void main(String[] args) {
    LinkedList<String> msgs = new LinkedList<String>();
    msgs.offer("welcome lonely logoout");
    msgs.offer("to flog");
    msgs.offer("into a fog");
    String token = "outto";  // 这个目标由第1和第2个字符串拼接而成
    String pat = "(.*" + token + ".*)";
    MultiSegmentNFA mnfa = new MultiSegmentNFA(pat);
    String target = msgs.poll();
    while (target != null) {
      if (mnfa.recognizes(target)) {
        break;
      }
      target = msgs.poll();
    }
    if (target == null) {
      System.out.println("cannot find " + token + " in msgs.");
    } else {
      System.out.println("find pat in <" + target + ">");
    }
  }
}

Note: Java的Pattern类使用的就是基于NFA的搜索算法, 见JDK 6文档java.util.regex.Pattern的"Comparison to Perl 5"一节。



Published

Mar 31, 2013

Last Updated

Apr 2, 2018

Category

Tech

Tags

  • algorithm 9
  • Java 106
  • nfa 2

Contact

  • Powered by Pelican. Theme: Elegant by Talha Mansoor