[nycbug-talk] Some history of Unix utilities

Dan Cross crossd at gmail.com
Thu Aug 8 10:01:19 EDT 2013


On Tue, Aug 6, 2013 at 1:34 PM, Jimmy B. <jpb at jimby.name> wrote:

> I remember reading the announcements on Usenet in the 1980's about
> Henry Spencer's regex library and thinking how *way cool* that was.
> The library turned up in quite a few places.  I seem to recall it
> was used in quite a few applications of that era, including NNTP
> and early versions of the X window protocol and Perl itself.
> (Please correct me if my memory has turned against me :-))
>
> As much as I liked it, I was even happier when PCRE hit the bitwaves.
> Much simpler, and more powerful.
>

It is actually a bit of a shame; Henry Spencer's library used the
unfortunate technique of backtracking to implement search.  I say
unfortunate because backtracking can (and does) go exponential in both time
and space, as opposed to, say, a non-deterministic finite automata
simulation which has superlinear, but not exponential runtime.

PCRE took advantage of backtracking and extended "regular" expressions
(which are things that, by definition, can be implemented as deterministic
finite automata *without additional memory*.  Corollary: matches against
DFA's run in constant memory and time linear in the length of the input
being searched for a match) with things that cannot be implemented on
finite automata, e.g., backreferences in expressions such as
'(foo|bar).*\1' to match 'foo.*foo' or 'bar.*bar', but not 'foo.*bar' or
'bar.*foo', which extra state.  Note that these are no longer regular
expressions in the formal sense; instead, they are somewhere between
regular expressions and things expressible via push-down automata.  (Note:
the author of the book, "Mastering Regular Expressions" gets this wrong and
claims that backtracking implementations are NDFAs.)

Ken Thompson had done some important work in QED on CTSS to compile regular
expressions to machine code that implemented an NDFA simulation.  Unix's ed
didn't do this, nor did the first grep's when they were liberated from ed,
hence, Unix grep can support limited back-references in a manner similar to
PCRE.

Some modern libraries implement DFA and NDFA matching, but necessarily omit
implementing things like backreferences.  RE2 is an example.

This is a great introduction to the fascinating subject of regular
expressions, and their history and implementation in Unix, Unix-like, and
ancestor systems: http://swtch.com/~rsc/regexp/

        - Dan C.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nycbug.org/pipermail/talk/attachments/20130808/9e9b1a54/attachment.html>


More information about the talk mailing list