diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-07 15:26:00 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-07 15:26:00 +0000 |
commit | 830407e88f9d40d954356c3754f2647f91d5c06a (patch) | |
tree | d6a0ece6feea91f3c656166dbaa884ef8a29740e /modules/policy/lua-aho-corasick/README.md | |
parent | Initial commit. (diff) | |
download | knot-resolver-830407e88f9d40d954356c3754f2647f91d5c06a.tar.xz knot-resolver-830407e88f9d40d954356c3754f2647f91d5c06a.zip |
Adding upstream version 5.6.0.upstream/5.6.0upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'modules/policy/lua-aho-corasick/README.md')
-rw-r--r-- | modules/policy/lua-aho-corasick/README.md | 40 |
1 files changed, 40 insertions, 0 deletions
diff --git a/modules/policy/lua-aho-corasick/README.md b/modules/policy/lua-aho-corasick/README.md new file mode 100644 index 0000000..b5cc406 --- /dev/null +++ b/modules/policy/lua-aho-corasick/README.md @@ -0,0 +1,40 @@ +aho-corasick-lua +================ + +C++ and Lua Implementation of the Aho-Corasick (AC) string matching algorithm +(http://dl.acm.org/citation.cfm?id=360855). + +We began with pure Lua implementation and realize the performance is not +satisfactory. So we switch to C/C++ implementation. + +There are two shared objects provied by this package: libac.so and ahocorasick.so +The former is a regular shared object which can be directly used by C/C++ +application, or by Lua via FFI; and the later is a Lua module. An example usage +is shown bellow: + +```lua +local ac = require "ahocorasick" +local dict = {"string1", "string", "etc"} +local acinst = ac.create(dict) +local r = ac.match(acinst, "mystring") +``` + +For efficiency reasons, the implementation is slightly different from the +standard AC algorithm in that it doesn't return a set of strings in the dictionary +that match the given string, instead it only returns one of them in case the string +matches. The functionality of our implementation can be (precisely) described by +following pseudo-c snippet. + +```C +string foo(input-string, dictionary) { + string ret = the-end-of-input-string; + for each string s in dictionary { + // find the first occurrence match sub-string. + ret = min(ret, strstr(input-string, s); + } + return ret; +} +``` + +It's pretty easy to get rid of this limitation, just to associate each state with +a spare bit-vector dipicting the set of strings recognized by that state. |