Jump to content

Talk:Perl Compatible Regular Expressions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

[PCRE vs PCRE2]

[edit]

There is a newer version of the library described as PCRE2 with the 'original' now in maintenance. Some info on the merits and issues with PCRE2 would be beneficial, though I recognise that Wikipedia may not be the appropriate place for this if it is not covered on the project website (I could not find any obvious information on the differences). — Preceding unsigned comment added by BlueSulla (talkcontribs) 10:48, 12 January 2017 (UTC)[reply]

[Untitled]

[edit]

Shouldn't it be "Perl"?

Shouldn't what be "Perl"?

Misnomer

[edit]

From the article:

The name is a misnomer, because Perl's regular expressions permit embedded Perl code during matching and replacing, so to be truly "Perl Compatible" would need a full Perl interpreter embedded in the library. PCRE contains no such interpreter.

Um...it's not a misnomer. The regular expressions of PCRE are compatible with Perl's. Embedded source code does not constitute a "regular expression pattern". 70.20.145.231 19:57, 26 November 2005 (UTC)[reply]

No, they're not entirely exactly compatible. Embedded Perl source code is considered part of the regular expression in Perl. PCRE will always do things that Perl can't, and vice versa. Randal L. Schwartz 21:27, 26 November 2005 (UTC)[reply]
I disagree its a misnomer. PCRE deliberately tries to be feature compatible with Perl. And at least recently Perl has tried to become feature compatible with PCRE. The fact is that in Perl 5.9.x active efforts have been made to make Perl support the PCRE syntax extensions, and in PCRE 7.x Philip has made active efforts to keep pace with new features being added to Perl 5.9.x. Also, I think that the whole quibble about (??{}) is unreasonable. In all production versions of Perl the feature is marked experimental, and I dont feel that supporting experimental features is required for compatibility. Do we expect PCRE to maintain bug compatibility as well? Demerphq 18:59, 14 May 2007 (UTC)[reply]

That's a very silly thing to put in an encyclopaedia article in that way. It sounds like a silly sling against PCRE. Nothing is compatible with anything. Come on. 208.115.233.109 14:47, 23 March 2007 (UTC)[reply]

I cannot be bothered to analyze the above, but there's something wrong with this part of the article right now. It says The name is therefore a misnomer, because PCRE is "Perl Compatible" only if [...]. The "therefore" appears to be a back-reference (pun intended) to a part that goes [PCRE] is much more powerful and flexible than POSIX regular expressions.. And that doesn't make sense, because the same applies to Perl REs. JöG 10:58, 5 July 2007 (UTC)[reply]

PCRE_EXTENDED

[edit]

Need described on article?

See "CONDITIONAL SUBPATTERNS".

Recursive expressions

[edit]

To Randal and anyone else interested: we say:

PCRE has developed a unique feature set, which in some cases includes features not yet available even in Perl's regular expression engine. For example, recursive subpatterns will be a feature of an upcoming 2006 release of Perl 5, but are already available in PCRE

Our citation for this is the PCRE manual which says:

Fairly obviously, PCRE does not support the (?{code}) and (??{code}) constructions. However, there is support for recursive patterns. This is not available in Perl 5.8, but will be in Perl 5.10.

This is pretty clear. PCRE doesn't support Perl's inline code ala ?{print "foo"} or ??{$regexp}, but does support recursive subexpressions, a feature which is being added to 5.10. I've been following this in Perl and PCRE, so I'm pretty sure of this. -Harmil 00:09, 14 February 2007 (UTC)[reply]

As an example of how PCRE and Perl regex are different, it's a bad example. Perhaps a simpler example that isn't so convoluted to explain would be handy. --Randal L. Schwartz 21:12, 16 February 2007 (UTC)[reply]
Ok, rather than trying to expand and clarify the example to the point of distracting from the topic at hand, I've extracted the example and simply put the conclusion in: the development of new features in both languages is coordinated, but unique. I think that sums up the situation, and is supported by the ref, without getting bogged down. I still think that the recursive sub-pattern feature (while it can be emulated by variable or code interpolation via ?{...} and ??{...}) is distinct and unique, but if there's a debate to be had around it, Wikipedia's not the place to have that debate, so I'll stand down. -Harmil 21:33, 16 February 2007 (UTC)[reply]

Distinction between syntax and implementation

[edit]

I think the problem with this article is that it merges PCRE as a syntax with the PCRE library. A regular expression dialect is IMO generally considered to be "Perl Compatible" if it follows the general syntax that Larry Wall devised for Perl with such features as the consistant escaping rules and the (?...) notation for special patterns, particularly (?:...) for noncapturing grouping. The PCRE library is therefore just one of several Perl compatible regular expression engines, albeit the most widely used in other software.

I think distinguishing the two would make things clearer. It also would make it easier to explain that various implementations have influenced each other in terms of syntax. For instance PCRE 7 supports the .NET syntax for named buffer declaration as Perl 5.9.x has decided to support both the .NET syntax and the Python syntax. Similarly features from a Java "perl compatible" regular expression engine have been added to Perl 5.9.x and therefore also added to PCRE 7, vice versa features that first saw implementation in PCRE 7 have now been added to Perl 5.9.x. Demerphq 19:56, 14 May 2007 (UTC)[reply]

Programs that use PCRE / Language bindings for PCRE

[edit]

I've started this section so that authors of programs, libraries, frameworks, language bindings, etc that use PCRE can highlight that fact here. I do this primarily because I don't think the author of such a program should be editing the primary entry as they have a vested interest in the subject. Adding the entry here allows the author to make editors of the page aware programs or projects that may be relevant, and allow an unbiased third party to update the main entry as appropriate.

Project: RegexKit url: http://regexkit.sourceforge.net/ Description: An Objective-C Framework for Regular Expressions using the PCRE Library for Mac OS X Cocoa and GNUstep. License: BSD. Language: Objective-C. —Preceding unsigned comment added by 205.150.102.219 (talk) 23:40, 26 January 2008 (UTC)[reply]

Recursive atomicity

[edit]

Is there any simpler example illustrating this difference between Perl regexen and PCREs than the one used now: "<<!>!>!>><>>!>!>!>" =~ /^(<(?:[^<>]+|(?3)|(?1))*>)()(!>!>!>)$/

I'm quite familiar with Perl's REs and moderately familiar with PCREs but it would take me at least three cups of coffee to determine what that actually means, if anything. If there isn't a simpler example, can we just say there are pathological cases that behave differently because of this distinction and leave it to a cited source to go into the details? Keziah (talk) 21:09, 1 September 2010 (UTC)[reply]

There are many other more simple examples. For instance, "aa" =~ /^(|a)(?1)$/ successfully matches in Perl but fails in PCRE. When backtracking occurs and the second alternative in the group ('a') is tried at the top level, (?1) will match the empty string, but the overall match would fail trying to match $ for the second time. Since the recursed subpattern is effectively atomic, the group cannot be re-entered at this point, and the match fails.Jaytea1 (talk) 17:40, 26 June 2011 (UTC)[reply]

Huh?

[edit]

There should be at least one example of usage. E.g., I have some C string variable (char *) that could possibly be matched using a RE. How do I do it? What happens? --72.70.91.223 (talk) 08:16, 9 July 2014 (UTC)[reply]

pcre syntax highlighting lost

[edit]

Since the switch from Geshi to Pygments for syntax highlighting (phab:T85794), support for 'pcre' was unfortunately dropped, as can be seen with the plain text formatting on this page and many others such as Regular expression, Sentence boundary disambiguation, String literal, Leaning toothpick syndrome and Thompson's construction algorithm. If we want specialised 'pcre' syntax highlight support again, it will need to be added to Pygments. However, we may be able to find a suitable fallback already available with Pygments, which can be added to the software like this patch. Both lang=perl and lang=nginx look better than lang=text for all use of pcre that I can see. John Vandenberg (chat) 22:57, 12 July 2015 (UTC)[reply]

Non-capturing matching groups

[edit]

As mentioned a little above, (?:...) defines a non-capturing grouping. If that is the case with PCRE, it must be mentioned in the page. Urhixidur (talk) 19:49, 22 January 2018 (UTC)[reply]

Formatting experiment

[edit]
[ 218.190.230.240 @ CE 2021-03-14 14:15 UTC:
https://en.wikipedia.org/?diff=prev&oldid=

To:
|*| https://en.wikipedia.org/wiki/User:Sebastian_Hudak
|*| https://en.wikipedia.org/wiki/User:1234qwer1234qwer4
|*| https://en.wikipedia.org/wiki/User:WikiCleanerBot

Didn't you all realize, that after all your hardly verified "normalization"s... the text has become significantly harder to parse?

Compare the 2:
|*| https://en.wikipedia.org/?oldid=999160357
|*| https://en.wikipedia.org/?diff=992289898&oldid=999160357




[ Quote Sebastian Hudak @ CE 2020-12-28 01:05 UTC:
https://en.wikipedia.org/?diff=prev&oldid=996675467

removed all of the empty lines separating things in the article ]

The amount of line spacing carries semantic meaning: it's intended to create context isolation. (to provide smooth seeking; in particular, easy random access)


[ Quote WikiCleanerBot @ CE 2021-01-08 19:55 UTC:
https://en.wikipedia.org/?diff=prev&oldid=999160357

Fix errors for CW project (Reference before punctuation) ]

Having the Linker (or Extra Description) placed before/after the punctuation serves different semantic meaning. (applies to part of the statement (sub-clause or component), or the entire statement) ]