Qt
Internal/Contributor docs for the Qt SDK. Note: These are NOT official API docs; those are found at https://doc.qt.io/
Loading...
Searching...
No Matches
qregularexpression.cpp
Go to the documentation of this file.
1// Copyright (C) 2020 Giuseppe D'Angelo <dangelog@gmail.com>.
2// Copyright (C) 2020 Klarälvdalens Datakonsult AB, a KDAB Group company, info@kdab.com, author Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>
3// Copyright (C) 2021 The Qt Company Ltd.
4// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR LGPL-3.0-only OR GPL-2.0-only OR GPL-3.0-only
5// Qt-Security score:critical reason:data-parser
6
8
9#include <QtCore/qcoreapplication.h>
10#include <QtCore/qhashfunctions.h>
11#include <QtCore/qlist.h>
12#include <QtCore/qmutex.h>
13#include <QtCore/qstringlist.h>
14#include <QtCore/qdebug.h>
15#include <QtCore/qglobal.h>
16#include <QtCore/qatomic.h>
17#include <QtCore/qdatastream.h>
18
19#if defined(Q_OS_MACOS)
20#include <QtCore/private/qcore_mac_p.h>
21#endif
22
23#define PCRE2_CODE_UNIT_WIDTH 16
24
25#include <pcre2.h>
26
27QT_BEGIN_NAMESPACE
28
29using namespace Qt::StringLiterals;
30
31/*!
32 \class QRegularExpression
33 \inmodule QtCore
34 \reentrant
35
36 \brief The QRegularExpression class provides pattern matching using regular
37 expressions.
38
39 \since 5.0
40
41 \ingroup tools
42 \ingroup shared
43 \ingroup string-processing
44
45 \keyword regular expression
46
47 \compares equality
48 Regular expressions, or \e{regexps}, are a very powerful tool to handle
49 strings and texts. This is useful in many contexts, e.g.,
50
51 \table
52 \row \li Validation
53 \li A regexp can test whether a substring meets some criteria,
54 e.g. is an integer or contains no whitespace.
55 \row \li Searching
56 \li A regexp provides more powerful pattern matching than
57 simple substring matching, e.g., match one of the words
58 \e{mail}, \e{letter} or \e{correspondence}, but none of the
59 words \e{email}, \e{mailman}, \e{mailer}, \e{letterbox}, etc.
60 \row \li Search and Replace
61 \li A regexp can replace all occurrences of a substring with a
62 different substring, e.g., replace all occurrences of \e{&}
63 with \e{\&amp;} except where the \e{&} is already followed by
64 an \e{amp;}.
65 \row \li String Splitting
66 \li A regexp can be used to identify where a string should be
67 split apart, e.g. splitting tab-delimited strings.
68 \endtable
69
70 This document is by no means a complete reference to pattern matching using
71 regular expressions, and the following parts will require the reader to
72 have some basic knowledge about Perl-like regular expressions and their
73 pattern syntax.
74
75 Good references about regular expressions include:
76
77 \list
78 \li \e {Mastering Regular Expressions} (Third Edition) by Jeffrey E. F.
79 Friedl, ISBN 0-596-52812-4;
80 \li the \l{https://pcre.org/original/doc/html/pcrepattern.html}
81 {pcrepattern(3)} man page, describing the pattern syntax supported by PCRE
82 (the reference implementation of Perl-compatible regular expressions);
83 \li the \l{http://perldoc.perl.org/perlre.html} {Perl's regular expression
84 documentation} and the \l{http://perldoc.perl.org/perlretut.html} {Perl's
85 regular expression tutorial}.
86 \endlist
87
88 \section1 Introduction
89
90 QRegularExpression implements Perl-compatible regular expressions. It fully
91 supports Unicode. For an overview of the regular expression syntax
92 supported by QRegularExpression, please refer to the aforementioned
93 pcrepattern(3) man page. A regular expression is made up of two things: a
94 \b{pattern string} and a set of \b{pattern options} that change the
95 meaning of the pattern string.
96
97 You can set the pattern string by passing a string to the QRegularExpression
98 constructor:
99
100 \snippet code/src_corelib_text_qregularexpression.cpp 0
101
102 This sets the pattern string to \c{a pattern}. You can also use the
103 setPattern() function to set a pattern on an existing QRegularExpression
104 object:
105
106 \snippet code/src_corelib_text_qregularexpression.cpp 1
107
108 Note that due to C++ literal strings rules, you must escape all backslashes
109 inside the pattern string with another backslash:
110
111 \snippet code/src_corelib_text_qregularexpression.cpp 2
112
113 Alternatively, you can use a
114 \l {https://en.cppreference.com/w/cpp/language/string_literal} {raw string literal},
115 in which case you don't need to escape backslashes in the pattern, all characters
116 between \c {R"(...)"} are considered raw characters. As you can see in the following
117 example, this simplifies writing patterns:
118
119 \snippet code/src_corelib_text_qregularexpression.cpp 35
120
121 The pattern() function returns the pattern that is currently set for a
122 QRegularExpression object:
123
124 \snippet code/src_corelib_text_qregularexpression.cpp 3
125
126 \section1 Pattern Options
127
128 The meaning of the pattern string can be modified by setting one or more
129 \e{pattern options}. For instance, it is possible to set a pattern to match
130 case insensitively by setting the QRegularExpression::CaseInsensitiveOption.
131
132 You can set the options by passing them to the QRegularExpression
133 constructor, as in:
134
135 \snippet code/src_corelib_text_qregularexpression.cpp 4
136
137 Alternatively, you can use the setPatternOptions() function on an existing
138 QRegularExpressionObject:
139
140 \snippet code/src_corelib_text_qregularexpression.cpp 5
141
142 It is possible to get the pattern options currently set on a
143 QRegularExpression object by using the patternOptions() function:
144
145 \snippet code/src_corelib_text_qregularexpression.cpp 6
146
147 Please refer to the QRegularExpression::PatternOption enum documentation for
148 more information about each pattern option.
149
150 \section1 Match Type and Match Options
151
152 The last two arguments of the match() and the globalMatch() functions set
153 the match type and the match options. The match type is a value of the
154 QRegularExpression::MatchType enum; the "traditional" matching algorithm is
155 chosen by using the NormalMatch match type (the default). It is also
156 possible to enable partial matching of the regular expression against a
157 subject string: see the \l{partial matching} section for more details.
158
159 The match options are a set of one or more QRegularExpression::MatchOption
160 values. They change the way a specific match of a regular expression
161 against a subject string is done. Please refer to the
162 QRegularExpression::MatchOption enum documentation for more details.
163
164 \target normal matching
165 \section1 Normal Matching
166
167 In order to perform a match you can simply invoke the match() function
168 passing a string to match against. We refer to this string as the
169 \e{subject string}. The result of the match() function is a
170 QRegularExpressionMatch object that can be used to inspect the results of
171 the match. For instance:
172
173 \snippet code/src_corelib_text_qregularexpression.cpp 7
174
175 If a match is successful, the (implicit) capturing group number 0 can be
176 used to retrieve the substring matched by the entire pattern (see also the
177 section about \l{extracting captured substrings}):
178
179 \snippet code/src_corelib_text_qregularexpression.cpp 8
180
181 It's also possible to start a match at an arbitrary offset inside the
182 subject string by passing the offset as an argument of the
183 match() function. In the following example \c{"12 abc"}
184 is not matched because the match is started at offset 1:
185
186 \snippet code/src_corelib_text_qregularexpression.cpp 9
187
188 \target extracting captured substrings
189 \section2 Extracting captured substrings
190
191 The QRegularExpressionMatch object contains also information about the
192 substrings captured by the capturing groups in the pattern string. The
193 \l{QRegularExpressionMatch::}{captured()} function will return the string
194 captured by the n-th capturing group:
195
196 \snippet code/src_corelib_text_qregularexpression.cpp 10
197
198 Capturing groups in the pattern are numbered starting from 1, and the
199 implicit capturing group 0 is used to capture the substring that matched
200 the entire pattern.
201
202 It's also possible to retrieve the starting and the ending offsets (inside
203 the subject string) of each captured substring, by using the
204 \l{QRegularExpressionMatch::}{capturedStart()} and the
205 \l{QRegularExpressionMatch::}{capturedEnd()} functions:
206
207 \snippet code/src_corelib_text_qregularexpression.cpp 11
208
209 All of these functions have an overload taking a QString as a parameter
210 in order to extract \e{named} captured substrings. For instance:
211
212 \snippet code/src_corelib_text_qregularexpression.cpp 12
213
214 \target global matching
215 \section1 Global Matching
216
217 \e{Global matching} is useful to find all the occurrences of a given
218 regular expression inside a subject string. Suppose that we want to extract
219 all the words from a given string, where a word is a substring matching
220 the pattern \c{\w+}.
221
222 QRegularExpression::globalMatch returns a QRegularExpressionMatchIterator,
223 which is a Java-like forward iterator that can be used to iterate over the
224 results. For instance:
225
226 \snippet code/src_corelib_text_qregularexpression.cpp 13
227
228 Since it's a Java-like iterator, the QRegularExpressionMatchIterator will
229 point immediately before the first result. Every result is returned as a
230 QRegularExpressionMatch object. The
231 \l{QRegularExpressionMatchIterator::}{hasNext()} function will return true
232 if there's at least one more result, and
233 \l{QRegularExpressionMatchIterator::}{next()} will return the next result
234 and advance the iterator. Continuing from the previous example:
235
236 \snippet code/src_corelib_text_qregularexpression.cpp 14
237
238 You can also use \l{QRegularExpressionMatchIterator::}{peekNext()} to get
239 the next result without advancing the iterator.
240
241 It is also possible to simply use the result of
242 QRegularExpression::globalMatch in a range-based for loop, for instance
243 like this:
244
245 \snippet code/src_corelib_text_qregularexpression.cpp 34
246
247 It is possible to pass a starting offset and one or more match options to
248 the globalMatch() function, exactly like normal matching with match().
249
250 \target partial matching
251 \section1 Partial Matching
252
253 A \e{partial match} is obtained when the end of the subject string is
254 reached, but more characters are needed to successfully complete the match.
255 Note that a partial match is usually much more inefficient than a normal
256 match because many optimizations of the matching algorithm cannot be
257 employed.
258
259 A partial match must be explicitly requested by specifying a match type of
260 PartialPreferCompleteMatch or PartialPreferFirstMatch when calling
261 QRegularExpression::match or QRegularExpression::globalMatch. If a partial
262 match is found, then calling the \l{QRegularExpressionMatch::}{hasMatch()}
263 function on the QRegularExpressionMatch object returned by match() will
264 return \c{false}, but \l{QRegularExpressionMatch::}{hasPartialMatch()} will return
265 \c{true}.
266
267 When a partial match is found, no captured substrings are returned, and the
268 (implicit) capturing group 0 corresponding to the whole match captures the
269 partially matched substring of the subject string.
270
271 Note that asking for a partial match can still lead to a complete match, if
272 one is found; in this case, \l{QRegularExpressionMatch::}{hasMatch()} will
273 return \c{true} and \l{QRegularExpressionMatch::}{hasPartialMatch()}
274 \c{false}. It never happens that a QRegularExpressionMatch reports both a
275 partial and a complete match.
276
277 Partial matching is mainly useful in two scenarios: validating user input
278 in real time and incremental/multi-segment matching.
279
280 \target validating user input
281 \section2 Validating user input
282
283 Suppose that we would like the user to input a date in a specific
284 format, for instance "MMM dd, yyyy". We can check the input validity with
285 a pattern like:
286
287 \c{^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d\d?, \d\d\d\d$}
288
289 (This pattern doesn't catch invalid days, but let's keep it for the
290 example's purposes).
291
292 We would like to validate the input with this regular expression \e{while}
293 the user is typing it, so that we can report an error in the input as soon
294 as it is committed (for instance, the user typed the wrong key). In order
295 to do so we must distinguish three cases:
296
297 \list
298 \li the input cannot possibly match the regular expression;
299 \li the input does match the regular expression;
300 \li the input does not match the regular expression right now,
301 but it will if more characters will be added to it.
302 \endlist
303
304 Note that these three cases represent exactly the possible states of a
305 QValidator (see the QValidator::State enum).
306
307 In particular, in the last case we want the regular expression engine to
308 report a partial match: we are successfully matching the pattern against
309 the subject string but the matching cannot continue because the end of the
310 subject is encountered. Notice, however, that the matching algorithm should
311 continue and try all possibilities, and in case a complete (non-partial)
312 match is found, then this one should be reported, and the input string
313 accepted as fully valid.
314
315 This behavior is implemented by the PartialPreferCompleteMatch match type.
316 For instance:
317
318 \snippet code/src_corelib_text_qregularexpression.cpp 15
319
320 If matching the same regular expression against the subject string leads to
321 a complete match, it is reported as usual:
322
323 \snippet code/src_corelib_text_qregularexpression.cpp 16
324
325 Another example with a different pattern, showing the behavior of
326 preferring a complete match over a partial one:
327
328 \snippet code/src_corelib_text_qregularexpression.cpp 17
329
330 In this case, the subpattern \c{abc\\w+X} partially matches the subject
331 string; however, the subpattern \c{def} matches the subject string
332 completely, and therefore a complete match is reported.
333
334 If multiple partial matches are found when matching (but no complete
335 match), then the QRegularExpressionMatch object will report the first one
336 that is found. For instance:
337
338 \snippet code/src_corelib_text_qregularexpression.cpp 18
339
340 \section2 Incremental/multi-segment matching
341
342 Incremental matching is another use case of partial matching. Suppose that
343 we want to find the occurrences of a regular expression inside a large text
344 (that is, substrings matching the regular expression). In order to do so we
345 would like to "feed" the large text to the regular expression engines in
346 smaller chunks. The obvious problem is what happens if the substring that
347 matches the regular expression spans across two or more chunks.
348
349 In this case, the regular expression engine should report a partial match,
350 so that we can match again adding new data and (eventually) get a complete
351 match. This implies that the regular expression engine may assume that
352 there are other characters \e{beyond the end} of the subject string. This
353 is not to be taken literally -- the engine will never try to access
354 any character after the last one in the subject.
355
356 QRegularExpression implements this behavior when using the
357 PartialPreferFirstMatch match type. This match type reports a partial match
358 as soon as it is found, and other match alternatives are not tried
359 (even if they could lead to a complete match). For instance:
360
361 \snippet code/src_corelib_text_qregularexpression.cpp 19
362
363 This happens because when matching the first branch of the alternation
364 operator a partial match is found, and therefore matching stops, without
365 trying the second branch. Another example:
366
367 \snippet code/src_corelib_text_qregularexpression.cpp 20
368
369 This shows what could seem a counterintuitive behavior of quantifiers:
370 since \c{?} is greedy, then the engine tries first to continue the match
371 after having matched \c{"abc"}; but then the matching reaches the end of the
372 subject string, and therefore a partial match is reported. This is
373 even more surprising in the following example:
374
375 \snippet code/src_corelib_text_qregularexpression.cpp 21
376
377 It's easy to understand this behavior if we remember that the engine
378 expects the subject string to be only a substring of the whole text we're
379 looking for a match into (that is, how we said before, that the engine
380 assumes that there are other characters beyond the end of the subject
381 string).
382
383 Since the \c{*} quantifier is greedy, then reporting a complete match could
384 be an error, because after the current subject \c{"abc"} there may be other
385 occurrences of \c{"abc"}. For instance, the complete text could have been
386 "abcabcX", and therefore the \e{right} match to report (in the complete
387 text) would have been \c{"abcabc"}; by matching only against the leading
388 \c{"abc"} we instead get a partial match.
389
390 \section1 Error Handling
391
392 It is possible for a QRegularExpression object to be invalid because of
393 syntax errors in the pattern string. The isValid() function will return
394 true if the regular expression is valid, or false otherwise:
395
396 \snippet code/src_corelib_text_qregularexpression.cpp 22
397
398 You can get more information about the specific error by calling the
399 errorString() function; moreover, the patternErrorOffset() function
400 will return the offset inside the pattern string
401
402 \snippet code/src_corelib_text_qregularexpression.cpp 23
403
404 If a match is attempted with an invalid QRegularExpression, then the
405 returned QRegularExpressionMatch object will be invalid as well (that is,
406 its \l{QRegularExpressionMatch::}{isValid()} function will return false).
407 The same applies for attempting a global match.
408
409 \section1 Unsupported Perl-compatible Regular Expressions Features
410
411 QRegularExpression does not support all the features available in
412 Perl-compatible regular expressions. The most notable one is the fact that
413 duplicated names for capturing groups are not supported, and using them can
414 lead to undefined behavior.
415
416 This may change in a future version of Qt.
417
418 \section1 Debugging Code that Uses QRegularExpression
419
420 QRegularExpression internally uses a just in time compiler (JIT) to
421 optimize the execution of the matching algorithm. The JIT makes extensive
422 usage of self-modifying code, which can lead debugging tools such as
423 Valgrind to crash. You must enable all checks for self-modifying code if
424 you want to debug programs using QRegularExpression (for instance, Valgrind's
425 \c{--smc-check} command line option). The downside of enabling such checks
426 is that your program will run considerably slower.
427
428 To avoid that, the JIT is disabled by default if you compile Qt in debug
429 mode. It is possible to override the default and enable or disable the JIT
430 usage (both in debug or release mode) by setting the
431 \c{QT_ENABLE_REGEXP_JIT} environment variable to a non-zero or zero value
432 respectively.
433
434 \sa QRegularExpressionMatch, QRegularExpressionMatchIterator
435*/
436
437/*!
438 \class QRegularExpressionMatch
439 \inmodule QtCore
440 \reentrant
441
442 \brief The QRegularExpressionMatch class provides the results of a matching
443 a QRegularExpression against a string.
444
445 \since 5.0
446
447 \ingroup tools
448 \ingroup shared
449 \ingroup string-processing
450
451 \keyword regular expression match
452
453 A QRegularExpressionMatch object can be obtained by calling the
454 QRegularExpression::match() function, or as a single result of a global
455 match from a QRegularExpressionMatchIterator.
456
457 The success or the failure of a match attempt can be inspected by calling
458 the hasMatch() function. QRegularExpressionMatch also reports a successful
459 partial match through the hasPartialMatch() function.
460
461 In addition, QRegularExpressionMatch returns the substrings captured by the
462 capturing groups in the pattern string. The implicit capturing group with
463 index 0 captures the result of the whole match. The captured() function
464 returns each substring captured, either by the capturing group's index or
465 by its name:
466
467 \snippet code/src_corelib_text_qregularexpression.cpp 29
468
469 For each captured substring it is possible to query its starting and ending
470 offsets in the subject string by calling the capturedStart() and the
471 capturedEnd() function, respectively. The length of each captured
472 substring is available using the capturedLength() function.
473
474 The convenience function capturedTexts() will return \e{all} the captured
475 substrings at once (including the substring matched by the entire pattern)
476 in the order they have been captured by capturing groups; that is,
477 \c{captured(i) == capturedTexts().at(i)}.
478
479 You can retrieve the QRegularExpression object the subject string was
480 matched against by calling the regularExpression() function; the
481 match type and the match options are available as well by calling
482 the matchType() and the matchOptions() respectively.
483
484 Please refer to the QRegularExpression documentation for more information
485 about the Qt regular expression classes.
486
487 \sa QRegularExpression
488*/
489
490/*!
491 \class QRegularExpressionMatchIterator
492 \inmodule QtCore
493 \reentrant
494
495 \brief The QRegularExpressionMatchIterator class provides an iterator on
496 the results of a global match of a QRegularExpression object against a string.
497
498 \since 5.0
499
500 \ingroup tools
501 \ingroup shared
502 \ingroup string-processing
503
504 \keyword regular expression iterator
505
506 A QRegularExpressionMatchIterator object is a forward only Java-like
507 iterator; it can be obtained by calling the
508 QRegularExpression::globalMatch() function. A new
509 QRegularExpressionMatchIterator will be positioned before the first result.
510 You can then call the hasNext() function to check if there are more
511 results available; if so, the next() function will return the next
512 result and advance the iterator.
513
514 Each result is a QRegularExpressionMatch object holding all the information
515 for that result (including captured substrings).
516
517 For instance:
518
519 \snippet code/src_corelib_text_qregularexpression.cpp 30
520
521 Moreover, QRegularExpressionMatchIterator offers a peekNext() function
522 to get the next result \e{without} advancing the iterator.
523
524 Starting with Qt 6.0, it is also possible to simply use the result of
525 QRegularExpression::globalMatch in a range-based for loop, for instance
526 like this:
527
528 \snippet code/src_corelib_text_qregularexpression.cpp 34
529
530 You can retrieve the QRegularExpression object the subject string was
531 matched against by calling the regularExpression() function; the
532 match type and the match options are available as well by calling
533 the matchType() and the matchOptions() respectively.
534
535 Please refer to the QRegularExpression documentation for more information
536 about the Qt regular expression classes.
537
538 \sa QRegularExpression, QRegularExpressionMatch
539*/
540
541
542/*!
543 \enum QRegularExpression::PatternOption
544
545 The PatternOption enum defines modifiers to the way the pattern string
546 should be interpreted, and therefore the way the pattern matches against a
547 subject string.
548
549 \value NoPatternOption
550 No pattern options are set.
551
552 \value CaseInsensitiveOption
553 The pattern should match against the subject string in a case
554 insensitive way. This option corresponds to the /i modifier in Perl
555 regular expressions.
556
557 \value DotMatchesEverythingOption
558 The dot metacharacter (\c{.}) in the pattern string is allowed to match
559 any character in the subject string, including newlines (normally, the
560 dot does not match newlines). This option corresponds to the \c{/s}
561 modifier in Perl regular expressions.
562
563 \value MultilineOption
564 The caret (\c{^}) and the dollar (\c{$}) metacharacters in the pattern
565 string are allowed to match, respectively, immediately after and
566 immediately before any newline in the subject string, as well as at the
567 very beginning and at the very end of the subject string. This option
568 corresponds to the \c{/m} modifier in Perl regular expressions.
569
570 \value ExtendedPatternSyntaxOption
571 Any whitespace in the pattern string which is not escaped and outside a
572 character class is ignored. Moreover, an unescaped sharp (\b{#})
573 outside a character class causes all the following characters, until
574 the first newline (included), to be ignored. This can be used to
575 increase the readability of a pattern string as well as put comments
576 inside regular expressions; this is particularly useful if the pattern
577 string is loaded from a file or written by the user, because in C++
578 code it is always possible to use the rules for string literals to put
579 comments outside the pattern string. This option corresponds to the \c{/x}
580 modifier in Perl regular expressions.
581
582 \value InvertedGreedinessOption
583 The greediness of the quantifiers is inverted: \c{*}, \c{+}, \c{?},
584 \c{{m,n}}, etc. become lazy, while their lazy versions (\c{*?},
585 \c{+?}, \c{??}, \c{{m,n}?}, etc.) become greedy. There is no equivalent
586 for this option in Perl regular expressions.
587
588 \value DontCaptureOption
589 The non-named capturing groups do not capture substrings; named
590 capturing groups still work as intended, as well as the implicit
591 capturing group number 0 corresponding to the entire match. There is no
592 equivalent for this option in Perl regular expressions.
593
594 \value UseUnicodePropertiesOption
595 The meaning of the \c{\w}, \c{\d}, etc., character classes, as well as
596 the meaning of their counterparts (\c{\W}, \c{\D}, etc.), is changed
597 from matching ASCII characters only to matching any character with the
598 corresponding Unicode property. For instance, \c{\d} is changed to
599 match any character with the Unicode Nd (decimal digit) property;
600 \c{\w} to match any character with either the Unicode L (letter) or N
601 (digit) property, plus underscore, and so on. This option corresponds
602 to the \c{/u} modifier in Perl regular expressions.
603*/
604
605/*!
606 \enum QRegularExpression::MatchType
607
608 The MatchType enum defines the type of the match that should be attempted
609 against the subject string.
610
611 \value NormalMatch
612 A normal match is done.
613
614 \value PartialPreferCompleteMatch
615 The pattern string is matched partially against the subject string. If
616 a partial match is found, then it is recorded, and other matching
617 alternatives are tried as usual. If a complete match is then found,
618 then it's preferred to the partial match; in this case only the
619 complete match is reported. If instead no complete match is found (but
620 only the partial one), then the partial one is reported.
621
622 \value PartialPreferFirstMatch
623 The pattern string is matched partially against the subject string. If
624 a partial match is found, then matching stops and the partial match is
625 reported. In this case, other matching alternatives (potentially
626 leading to a complete match) are not tried. Moreover, this match type
627 assumes that the subject string only a substring of a larger text, and
628 that (in this text) there are other characters beyond the end of the
629 subject string. This can lead to surprising results; see the discussion
630 in the \l{partial matching} section for more details.
631
632 \value NoMatch
633 No matching is done. This value is returned as the match type by a
634 default constructed QRegularExpressionMatch or
635 QRegularExpressionMatchIterator. Using this match type is not very
636 useful for the user, as no matching ever happens. This enum value
637 has been introduced in Qt 5.1.
638*/
639
640/*!
641 \enum QRegularExpression::MatchOption
642
643 \value NoMatchOption
644 No match options are set.
645
646 \value AnchoredMatchOption
647 Use AnchorAtOffsetMatchOption instead.
648
649 \value AnchorAtOffsetMatchOption
650 The match is constrained to start exactly at the offset passed to
651 match() in order to be successful, even if the pattern string does not
652 contain any metacharacter that anchors the match at that point.
653 Note that passing this option does not anchor the end of the match
654 to the end of the subject; if you want to fully anchor a regular
655 expression, use anchoredPattern().
656 This enum value has been introduced in Qt 6.0.
657
658 \value DontCheckSubjectStringMatchOption
659 The subject string is not checked for UTF-16 validity before
660 attempting a match. Use this option with extreme caution, as
661 attempting to match an invalid string may crash the program and/or
662 constitute a security issue. This enum value has been introduced in
663 Qt 5.4.
664*/
665
666/*!
667 \internal
668*/
669static int convertToPcreOptions(QRegularExpression::PatternOptions patternOptions)
670{
671 int options = 0;
672
673 if (patternOptions & QRegularExpression::CaseInsensitiveOption)
674 options |= PCRE2_CASELESS;
675 if (patternOptions & QRegularExpression::DotMatchesEverythingOption)
676 options |= PCRE2_DOTALL;
677 if (patternOptions & QRegularExpression::MultilineOption)
678 options |= PCRE2_MULTILINE;
679 if (patternOptions & QRegularExpression::ExtendedPatternSyntaxOption)
680 options |= PCRE2_EXTENDED;
681 if (patternOptions & QRegularExpression::InvertedGreedinessOption)
682 options |= PCRE2_UNGREEDY;
683 if (patternOptions & QRegularExpression::DontCaptureOption)
684 options |= PCRE2_NO_AUTO_CAPTURE;
685 if (patternOptions & QRegularExpression::UseUnicodePropertiesOption)
686 options |= PCRE2_UCP;
687
688 return options;
689}
690
691/*!
692 \internal
693*/
694static int convertToPcreOptions(QRegularExpression::MatchOptions matchOptions)
695{
696 int options = 0;
697
698 if (matchOptions & QRegularExpression::AnchorAtOffsetMatchOption)
699 options |= PCRE2_ANCHORED;
700 if (matchOptions & QRegularExpression::DontCheckSubjectStringMatchOption)
701 options |= PCRE2_NO_UTF_CHECK;
702
703 return options;
704}
705
707{
711
716
721
723 qsizetype offset,
724 CheckSubjectStringOption checkSubjectStringOption = CheckSubjectString,
725 const QRegularExpressionMatchPrivate *previous = nullptr) const;
726
727 int captureIndexForName(QAnyStringView name) const;
728
729 // sizeof(QSharedData) == 4, so start our members with an enum
732
733 // *All* of the following members are managed while holding this mutex,
734 // except for isDirty which is set to true by QRegularExpression setters
735 // (right after a detach happened).
736 mutable QMutex mutex;
737
738 // The PCRE code pointer is reference-counted by the QRegularExpressionPrivate
739 // objects themselves; when the private is copied (i.e. a detach happened)
740 // it is set to nullptr
741 pcre2_code_16 *compiledPattern;
747};
748
750{
751 QRegularExpressionMatchPrivate(const QRegularExpression &re,
752 const QString &subjectStorage,
753 QStringView subject,
754 QRegularExpression::MatchType matchType,
755 QRegularExpression::MatchOptions matchOptions);
756
757 QRegularExpressionMatch nextMatch() const;
758
759 const QRegularExpression regularExpression;
760
761 // subject is what we match upon. If we've been asked to match over
762 // a QString, then subjectStorage is a copy of that string
763 // (so that it's kept alive by us)
766
769
770 // the capturedOffsets vector contains pairs of (start, end) positions
771 // for each captured substring
773
775
776 bool hasMatch = false;
777 bool hasPartialMatch = false;
778 bool isValid = false;
779};
780
782{
783 QRegularExpressionMatchIteratorPrivate(const QRegularExpression &re,
784 QRegularExpression::MatchType matchType,
785 QRegularExpression::MatchOptions matchOptions,
786 const QRegularExpressionMatch &next);
787
788 bool hasNext() const;
789 QRegularExpressionMatch next;
790 const QRegularExpression regularExpression;
793};
794
795/*!
796 \internal
797
798 Used to centralize the warning about using an invalid QRegularExpression.
799 In case the pattern is an illegal UTF-16 string, we can't pass print it
800 (pass it to qUtf16Printable, etc.), so we need to check for that.
801*/
803void qtWarnAboutInvalidRegularExpression(const QString &pattern, const char *cls, const char *method)
804{
805 if (pattern.isValidUtf16()) {
806 qWarning("%s::%s(): called on an invalid QRegularExpression object "
807 "(pattern is '%ls')", cls, method, qUtf16Printable(pattern));
808 } else {
809 qWarning("%s::%s(): called on an invalid QRegularExpression object",
810 cls, method);
811 }
812}
813
814/*!
815 \internal
816*/
817QRegularExpression::QRegularExpression(QRegularExpressionPrivate &dd)
818 : d(&dd)
819{
820}
821
822/*!
823 \internal
824*/
826 : QSharedData(),
828 pattern(),
829 mutex(),
830 compiledPattern(nullptr),
831 errorCode(0),
832 errorOffset(-1),
834 usingCrLfNewlines(false),
835 isDirty(true)
836{
837}
838
839/*!
840 \internal
841*/
846
847/*!
848 \internal
849
850 Copies the private, which means copying only the pattern and the pattern
851 options. The compiledPattern pointer is NOT copied (we
852 do not own it any more), and in general all the members set when
853 compiling a pattern are set to default values. isDirty is set back to true
854 so that the pattern has to be recompiled again.
855*/
869
870/*!
871 \internal
872*/
874{
875 pcre2_code_free_16(compiledPattern);
876 compiledPattern = nullptr;
877 errorCode = 0;
878 errorOffset = -1;
879 capturingCount = 0;
880 usingCrLfNewlines = false;
881}
882
883/*!
884 \internal
885*/
887{
888 const QMutexLocker lock(&mutex);
889
890 if (!isDirty)
891 return;
892
893 isDirty = false;
895
896 int options = convertToPcreOptions(patternOptions);
897 options |= PCRE2_UTF;
898
899 PCRE2_SIZE patternErrorOffset;
900 compiledPattern = pcre2_compile_16(reinterpret_cast<PCRE2_SPTR16>(pattern.constData()),
901 pattern.size(),
902 options,
903 &errorCode,
904 &patternErrorOffset,
905 nullptr);
906
907 if (!compiledPattern) {
908 errorOffset = qsizetype(patternErrorOffset);
909 return;
910 } else {
911 // ignore whatever PCRE2 wrote into errorCode -- leave it to 0 to mean "no error"
912 errorCode = 0;
913 }
914
917}
918
919/*!
920 \internal
921*/
923{
924 Q_ASSERT(compiledPattern);
925
926 pcre2_pattern_info_16(compiledPattern, PCRE2_INFO_CAPTURECOUNT, &capturingCount);
927
928 // detect the settings for the newline
929 unsigned int patternNewlineSetting;
930 if (pcre2_pattern_info_16(compiledPattern, PCRE2_INFO_NEWLINE, &patternNewlineSetting) != 0) {
931 // no option was specified in the regexp, grab PCRE build defaults
932 pcre2_config_16(PCRE2_CONFIG_NEWLINE, &patternNewlineSetting);
933 }
934
935 usingCrLfNewlines = (patternNewlineSetting == PCRE2_NEWLINE_CRLF) ||
936 (patternNewlineSetting == PCRE2_NEWLINE_ANY) ||
937 (patternNewlineSetting == PCRE2_NEWLINE_ANYCRLF);
938
939 unsigned int hasJOptionChanged;
940 pcre2_pattern_info_16(compiledPattern, PCRE2_INFO_JCHANGED, &hasJOptionChanged);
941 if (Q_UNLIKELY(hasJOptionChanged)) {
942 qWarning("QRegularExpressionPrivate::getPatternInfo(): the pattern '%ls'\n is using the (?J) option; duplicate capturing group names are not supported by Qt",
943 qUtf16Printable(pattern));
944 }
945}
946
947
948/*
949 Simple "smartpointer" wrapper around a pcre2_jit_stack_16, to be used with
950 QThreadStorage.
951*/
952namespace {
953struct PcreJitStackFree
954{
955 void operator()(pcre2_jit_stack_16 *stack)
956 {
957 if (stack)
958 pcre2_jit_stack_free_16(stack);
959 }
960};
961Q_CONSTINIT static thread_local std::unique_ptr<pcre2_jit_stack_16, PcreJitStackFree> jitStacks;
962}
963
964/*!
965 \internal
966*/
967static pcre2_jit_stack_16 *qtPcreCallback(void *)
968{
969 return jitStacks.get();
970}
971
972/*!
973 \internal
974*/
975static bool isJitEnabled()
976{
977 QByteArray jitEnvironment = qgetenv("QT_ENABLE_REGEXP_JIT");
978 if (!jitEnvironment.isEmpty()) {
979 bool ok;
980 int enableJit = jitEnvironment.toInt(&ok);
981 return ok ? (enableJit != 0) : true;
982 }
983
984#ifdef QT_DEBUG
985 return false;
986#elif defined(Q_OS_MACOS) && !defined(QT_BOOTSTRAPPED)
987 return !qt_mac_runningUnderRosetta();
988#else
989 return true;
990#endif
991}
992
993/*!
994 \internal
995
996 The purpose of the function is to call pcre2_jit_compile_16, which
997 JIT-compiles the pattern.
998
999 It gets called when a pattern is recompiled by us (in compilePattern()),
1000 under mutex protection.
1001*/
1003{
1004 Q_ASSERT(compiledPattern);
1005
1006 static const bool enableJit = isJitEnabled();
1007
1008 if (!enableJit)
1009 return;
1010
1011 pcre2_jit_compile_16(compiledPattern, PCRE2_JIT_COMPLETE | PCRE2_JIT_PARTIAL_SOFT | PCRE2_JIT_PARTIAL_HARD);
1012}
1013
1014/*!
1015 \internal
1016
1017 Returns the capturing group number for the given name. Duplicated names for
1018 capturing groups are not supported.
1019*/
1020int QRegularExpressionPrivate::captureIndexForName(QAnyStringView name) const
1021{
1022 Q_ASSERT(!name.isEmpty());
1023
1024 if (!compiledPattern)
1025 return -1;
1026
1027 // See the other usages of pcre2_pattern_info_16 for more details about this
1028 PCRE2_SPTR16 *namedCapturingTable;
1029 unsigned int namedCapturingTableEntryCount;
1030 unsigned int namedCapturingTableEntrySize;
1031
1032 pcre2_pattern_info_16(compiledPattern, PCRE2_INFO_NAMETABLE, &namedCapturingTable);
1033 pcre2_pattern_info_16(compiledPattern, PCRE2_INFO_NAMECOUNT, &namedCapturingTableEntryCount);
1034 pcre2_pattern_info_16(compiledPattern, PCRE2_INFO_NAMEENTRYSIZE, &namedCapturingTableEntrySize);
1035
1036 for (unsigned int i = 0; i < namedCapturingTableEntryCount; ++i) {
1037 const auto currentNamedCapturingTableRow =
1038 reinterpret_cast<const char16_t *>(namedCapturingTable) + namedCapturingTableEntrySize * i;
1039
1040 if (name == (currentNamedCapturingTableRow + 1)) {
1041 const int index = *currentNamedCapturingTableRow;
1042 return index;
1043 }
1044 }
1045
1046 return -1;
1047}
1048
1049/*!
1050 \internal
1051
1052 This is a simple wrapper for pcre2_match_16 for handling the case in which the
1053 JIT runs out of memory. In that case, we allocate a thread-local JIT stack
1054 and re-run pcre2_match_16.
1055*/
1056static int safe_pcre2_match_16(const pcre2_code_16 *code,
1057 PCRE2_SPTR16 subject, qsizetype length,
1058 qsizetype startOffset, int options,
1059 pcre2_match_data_16 *matchData,
1060 pcre2_match_context_16 *matchContext)
1061{
1062 int result = pcre2_match_16(code, subject, length,
1063 startOffset, options, matchData, matchContext);
1064
1065 if (result == PCRE2_ERROR_JIT_STACKLIMIT && !jitStacks) {
1066 // The default JIT stack size in PCRE is 32K,
1067 // we allocate from 32K up to 512K.
1068 jitStacks.reset(pcre2_jit_stack_create_16(32 * 1024, 512 * 1024, NULL));
1069
1070 result = pcre2_match_16(code, subject, length,
1071 startOffset, options, matchData, matchContext);
1072 }
1073
1074 return result;
1075}
1076
1077/*!
1078 \internal
1079
1080 Performs a match on the subject string view held by \a priv. The
1081 match will be of type priv->matchType and using the options
1082 priv->matchOptions; the matching \a offset is relative the
1083 substring, and if negative, it's taken as an offset from the end of
1084 the substring.
1085
1086 It also advances a match if a previous result is given as \a
1087 previous. The subject string goes a Unicode validity check if
1088 \a checkSubjectString is CheckSubjectString and the match options don't
1089 include DontCheckSubjectStringMatchOption (PCRE doesn't like illegal
1090 UTF-16 sequences).
1091
1092 \a priv is modified to hold the results of the match.
1093
1094 Advancing a match is a tricky algorithm. If the previous match matched a
1095 non-empty string, we just do an ordinary match at the offset position.
1096
1097 If the previous match matched an empty string, then an anchored, non-empty
1098 match is attempted at the offset position. If that succeeds, then we got
1099 the next match and we can return it. Otherwise, we advance by 1 position
1100 (which can be one or two code units in UTF-16!) and reattempt a "normal"
1101 match. We also have the problem of detecting the current newline format: if
1102 the new advanced offset is pointing to the beginning of a CRLF sequence, we
1103 must advance over it.
1104*/
1106 qsizetype offset,
1107 CheckSubjectStringOption checkSubjectStringOption,
1108 const QRegularExpressionMatchPrivate *previous) const
1109{
1110 Q_ASSERT(priv);
1111 Q_ASSERT(priv != previous);
1112
1113 const qsizetype subjectLength = priv->subject.size();
1114
1115 if (offset < 0)
1116 offset += subjectLength;
1117
1118 if (offset < 0 || offset > subjectLength)
1119 return;
1120
1121 if (Q_UNLIKELY(!compiledPattern)) {
1122 qtWarnAboutInvalidRegularExpression(pattern, "QRegularExpressionPrivate", "doMatch");
1123 return;
1124 }
1125
1126 // skip doing the actual matching if NoMatch type was requested
1127 if (priv->matchType == QRegularExpression::NoMatch) {
1128 priv->isValid = true;
1129 return;
1130 }
1131
1132 int pcreOptions = convertToPcreOptions(priv->matchOptions);
1133
1134 if (priv->matchType == QRegularExpression::PartialPreferCompleteMatch)
1135 pcreOptions |= PCRE2_PARTIAL_SOFT;
1136 else if (priv->matchType == QRegularExpression::PartialPreferFirstMatch)
1137 pcreOptions |= PCRE2_PARTIAL_HARD;
1138
1139 if (checkSubjectStringOption == DontCheckSubjectString)
1140 pcreOptions |= PCRE2_NO_UTF_CHECK;
1141
1142 bool previousMatchWasEmpty = false;
1143 if (previous && previous->hasMatch &&
1144 (previous->capturedOffsets.at(0) == previous->capturedOffsets.at(1))) {
1145 previousMatchWasEmpty = true;
1146 }
1147
1148 pcre2_match_context_16 *matchContext = pcre2_match_context_create_16(nullptr);
1149 pcre2_jit_stack_assign_16(matchContext, &qtPcreCallback, nullptr);
1150 pcre2_match_data_16 *matchData = pcre2_match_data_create_from_pattern_16(compiledPattern, nullptr);
1151
1152 // PCRE does not accept a null pointer as subject string, even if
1153 // its length is zero. We however allow it in input: a QStringView
1154 // subject may have data == nullptr. In this case, to keep PCRE
1155 // happy, pass a pointer to a dummy character.
1156 const char16_t dummySubject = 0;
1157 const char16_t * const subjectUtf16 = [&]()
1158 {
1159 const auto subjectUtf16 = priv->subject.utf16();
1160 if (subjectUtf16)
1161 return subjectUtf16;
1162 Q_ASSERT(subjectLength == 0);
1163 return &dummySubject;
1164 }();
1165
1166 int result;
1167
1168 if (!previousMatchWasEmpty) {
1169 result = safe_pcre2_match_16(compiledPattern,
1170 reinterpret_cast<PCRE2_SPTR16>(subjectUtf16), subjectLength,
1171 offset, pcreOptions,
1172 matchData, matchContext);
1173 } else {
1174 result = safe_pcre2_match_16(compiledPattern,
1175 reinterpret_cast<PCRE2_SPTR16>(subjectUtf16), subjectLength,
1176 offset, pcreOptions | PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED,
1177 matchData, matchContext);
1178
1179 if (result == PCRE2_ERROR_NOMATCH) {
1180 ++offset;
1181
1183 && offset < subjectLength
1184 && subjectUtf16[offset - 1] == u'\r'
1185 && subjectUtf16[offset] == u'\n') {
1186 ++offset;
1187 } else if (offset < subjectLength
1188 && QChar::isLowSurrogate(subjectUtf16[offset])) {
1189 ++offset;
1190 }
1191
1192 result = safe_pcre2_match_16(compiledPattern,
1193 reinterpret_cast<PCRE2_SPTR16>(subjectUtf16), subjectLength,
1194 offset, pcreOptions,
1195 matchData, matchContext);
1196 }
1197 }
1198
1199#ifdef QREGULAREXPRESSION_DEBUG
1200 qDebug() << "Matching" << pattern << "against" << subject
1201 << "offset" << offset
1202 << priv->matchType << priv->matchOptions << previousMatchWasEmpty
1203 << "result" << result;
1204#endif
1205
1206 // result == 0 means not enough space in captureOffsets; should never happen
1207 Q_ASSERT(result != 0);
1208
1209 if (result > 0) {
1210 // full match
1211 priv->isValid = true;
1212 priv->hasMatch = true;
1213 priv->capturedCount = result;
1214 priv->capturedOffsets.resize(result * 2);
1215 } else {
1216 // no match, partial match or error
1217 priv->hasPartialMatch = (result == PCRE2_ERROR_PARTIAL);
1218 priv->isValid = (result == PCRE2_ERROR_NOMATCH || result == PCRE2_ERROR_PARTIAL);
1219
1220 if (result == PCRE2_ERROR_PARTIAL) {
1221 // partial match:
1222 // leave the start and end capture offsets (i.e. cap(0))
1223 priv->capturedCount = 1;
1224 priv->capturedOffsets.resize(2);
1225 } else {
1226 // no match or error
1227 priv->capturedCount = 0;
1228 priv->capturedOffsets.clear();
1229 }
1230 }
1231
1232 // copy the captured substrings offsets, if any
1233 if (priv->capturedCount) {
1234 PCRE2_SIZE *ovector = pcre2_get_ovector_pointer_16(matchData);
1235 qsizetype *const capturedOffsets = priv->capturedOffsets.data();
1236
1237 // We rely on the fact that capturing groups that did not
1238 // capture anything have offset -1, but PCRE technically
1239 // returns "PCRE2_UNSET". Test that out, better safe than
1240 // sorry...
1241 static_assert(qsizetype(PCRE2_UNSET) == qsizetype(-1), "Internal error: PCRE2 changed its API");
1242
1243 for (int i = 0; i < priv->capturedCount * 2; ++i)
1244 capturedOffsets[i] = qsizetype(ovector[i]);
1245
1246 // For partial matches, PCRE2 and PCRE1 differ in behavior when lookbehinds
1247 // are involved. PCRE2 reports the real begin of the match and the maximum
1248 // used lookbehind as distinct information; PCRE1 instead automatically
1249 // adjusted ovector[0] to include the maximum lookbehind.
1250 //
1251 // For instance, given the pattern "\bstring\b", and the subject "a str":
1252 // * PCRE1 reports partial, capturing " str"
1253 // * PCRE2 reports partial, capturing "str" with a lookbehind of 1
1254 //
1255 // To keep behavior, emulate PCRE1 here.
1256 // (Eventually, we could expose the lookbehind info in a future patch.)
1257 if (result == PCRE2_ERROR_PARTIAL) {
1258 unsigned int maximumLookBehind;
1259 pcre2_pattern_info_16(compiledPattern, PCRE2_INFO_MAXLOOKBEHIND, &maximumLookBehind);
1260 capturedOffsets[0] -= maximumLookBehind;
1261 }
1262 }
1263
1264 pcre2_match_data_free_16(matchData);
1265 pcre2_match_context_free_16(matchContext);
1266}
1267
1268/*!
1269 \internal
1270*/
1272 const QString &subjectStorage,
1273 QStringView subject,
1274 QRegularExpression::MatchType matchType,
1275 QRegularExpression::MatchOptions matchOptions)
1281{
1282}
1283
1284/*!
1285 \internal
1286*/
1287QRegularExpressionMatch QRegularExpressionMatchPrivate::nextMatch() const
1288{
1289 Q_ASSERT(isValid);
1290 Q_ASSERT(hasMatch || hasPartialMatch);
1291
1292 auto nextPrivate = new QRegularExpressionMatchPrivate(regularExpression,
1293 subjectStorage,
1294 subject,
1295 matchType,
1296 matchOptions);
1297
1298 // Note the DontCheckSubjectString passed for the check of the subject string:
1299 // if we're advancing a match on the same subject,
1300 // then that subject was already checked at least once (when this object
1301 // was created, or when the object that created this one was created, etc.)
1302 regularExpression.d->doMatch(nextPrivate,
1303 capturedOffsets.at(1),
1304 QRegularExpressionPrivate::DontCheckSubjectString,
1305 this);
1306 return QRegularExpressionMatch(*nextPrivate);
1307}
1308
1309/*!
1310 \internal
1311*/
1313 QRegularExpression::MatchType matchType,
1314 QRegularExpression::MatchOptions matchOptions,
1315 const QRegularExpressionMatch &next)
1316 : next(next),
1319{
1320}
1321
1322/*!
1323 \internal
1324*/
1326{
1327 return next.isValid() && (next.hasMatch() || next.hasPartialMatch());
1328}
1329
1330// PUBLIC API
1331
1332/*!
1333 Constructs a QRegularExpression object with an empty pattern and no pattern
1334 options.
1335
1336 \sa setPattern(), setPatternOptions()
1337*/
1338QRegularExpression::QRegularExpression()
1339 : d(new QRegularExpressionPrivate)
1340{
1341}
1342
1343/*!
1344 Constructs a QRegularExpression object using the given \a pattern as
1345 pattern and the \a options as the pattern options.
1346
1347 \sa setPattern(), setPatternOptions()
1348*/
1349QRegularExpression::QRegularExpression(const QString &pattern, PatternOptions options)
1350 : d(new QRegularExpressionPrivate)
1351{
1352 d->pattern = pattern;
1353 d->patternOptions = options;
1354}
1355
1356/*!
1357 Constructs a QRegularExpression object as a copy of \a re.
1358
1359 \sa operator=()
1360*/
1361QRegularExpression::QRegularExpression(const QRegularExpression &re) noexcept = default;
1362
1363/*!
1364 \fn QRegularExpression::QRegularExpression(QRegularExpression &&re)
1365
1366 \since 6.1
1367
1368 Constructs a QRegularExpression object by moving from \a re.
1369
1370 Note that a moved-from QRegularExpression can only be destroyed or
1371 assigned to. The effect of calling other functions than the destructor
1372 or one of the assignment operators is undefined.
1373
1374 \sa operator=()
1375*/
1376
1377/*!
1378 Destroys the QRegularExpression object.
1379*/
1380QRegularExpression::~QRegularExpression()
1381{
1382}
1383
1384QT_DEFINE_QESDP_SPECIALIZATION_DTOR(QRegularExpressionPrivate)
1385
1386/*!
1387 Assigns the regular expression \a re to this object, and returns a reference
1388 to the copy. Both the pattern and the pattern options are copied.
1389*/
1390QRegularExpression &QRegularExpression::operator=(const QRegularExpression &re) noexcept = default;
1391
1392/*!
1393 \fn void QRegularExpression::swap(QRegularExpression &other)
1394 \memberswap{regular expression}
1395*/
1396
1397/*!
1398 Returns the pattern string of the regular expression.
1399
1400 \sa setPattern(), patternOptions()
1401*/
1402QString QRegularExpression::pattern() const
1403{
1404 return d->pattern;
1405}
1406
1407/*!
1408 Sets the pattern string of the regular expression to \a pattern. The
1409 pattern options are left unchanged.
1410
1411 \sa pattern(), setPatternOptions()
1412*/
1413void QRegularExpression::setPattern(const QString &pattern)
1414{
1415 if (d->pattern == pattern)
1416 return;
1417 d.detach();
1418 d->isDirty = true;
1419 d->pattern = pattern;
1420}
1421
1422/*!
1423 Returns the pattern options for the regular expression.
1424
1425 \sa setPatternOptions(), pattern()
1426*/
1427QRegularExpression::PatternOptions QRegularExpression::patternOptions() const
1428{
1429 return d->patternOptions;
1430}
1431
1432/*!
1433 Sets the given \a options as the pattern options of the regular expression.
1434 The pattern string is left unchanged.
1435
1436 \sa patternOptions(), setPattern()
1437*/
1438void QRegularExpression::setPatternOptions(PatternOptions options)
1439{
1440 if (d->patternOptions == options)
1441 return;
1442 d.detach();
1443 d->isDirty = true;
1444 d->patternOptions = options;
1445}
1446
1447/*!
1448 Returns the number of capturing groups inside the pattern string,
1449 or -1 if the regular expression is not valid.
1450
1451 \note The implicit capturing group 0 is \e{not} included in the returned number.
1452
1453 \sa isValid()
1454*/
1455int QRegularExpression::captureCount() const
1456{
1457 if (!isValid()) // will compile the pattern
1458 return -1;
1459 return d->capturingCount;
1460}
1461
1462/*!
1463 \since 5.1
1464
1465 Returns a list of captureCount() + 1 elements, containing the names of the
1466 named capturing groups in the pattern string. The list is sorted such that
1467 the element of the list at position \c{i} is the name of the \c{i}-th
1468 capturing group, if it has a name, or an empty string if that capturing
1469 group is unnamed.
1470
1471 For instance, given the regular expression
1472
1473 \snippet code/src_corelib_text_qregularexpression.cpp 32
1474
1475 namedCaptureGroups() will return the following list:
1476
1477 \snippet code/src_corelib_text_qregularexpression.cpp 33
1478
1479 which corresponds to the fact that the capturing group #0 (corresponding to
1480 the whole match) has no name, the capturing group #1 has name "day", the
1481 capturing group #2 has name "month", etc.
1482
1483 If the regular expression is not valid, returns an empty list.
1484
1485 \sa isValid(), QRegularExpressionMatch::captured(), QString::isEmpty()
1486*/
1487QStringList QRegularExpression::namedCaptureGroups() const
1488{
1489 if (!isValid()) // isValid() will compile the pattern
1490 return QStringList();
1491
1492 // namedCapturingTable will point to a table of
1493 // namedCapturingTableEntryCount entries, each one of which
1494 // contains one ushort followed by the name, NUL terminated.
1495 // The ushort is the numerical index of the name in the pattern.
1496 // The length of each entry is namedCapturingTableEntrySize.
1497 PCRE2_SPTR16 *namedCapturingTable;
1498 unsigned int namedCapturingTableEntryCount;
1499 unsigned int namedCapturingTableEntrySize;
1500
1501 pcre2_pattern_info_16(d->compiledPattern, PCRE2_INFO_NAMETABLE, &namedCapturingTable);
1502 pcre2_pattern_info_16(d->compiledPattern, PCRE2_INFO_NAMECOUNT, &namedCapturingTableEntryCount);
1503 pcre2_pattern_info_16(d->compiledPattern, PCRE2_INFO_NAMEENTRYSIZE, &namedCapturingTableEntrySize);
1504
1505 // The +1 is for the implicit group #0
1506 QStringList result(d->capturingCount + 1);
1507
1508 for (unsigned int i = 0; i < namedCapturingTableEntryCount; ++i) {
1509 const auto currentNamedCapturingTableRow =
1510 reinterpret_cast<const char16_t *>(namedCapturingTable) + namedCapturingTableEntrySize * i;
1511
1512 const int index = *currentNamedCapturingTableRow;
1513 result[index] = QStringView(currentNamedCapturingTableRow + 1).toString();
1514 }
1515
1516 return result;
1517}
1518
1519/*!
1520 Returns \c true if the regular expression is a valid regular expression (that
1521 is, it contains no syntax errors, etc.), or false otherwise. Use
1522 errorString() to obtain a textual description of the error.
1523
1524 \sa errorString(), patternErrorOffset()
1525*/
1526bool QRegularExpression::isValid() const
1527{
1528 d.data()->compilePattern();
1529 return d->compiledPattern;
1530}
1531
1532/*!
1533 Returns a textual description of the error found when checking the validity
1534 of the regular expression, or "no error" if no error was found.
1535
1536 \sa isValid(), patternErrorOffset()
1537*/
1538QString QRegularExpression::errorString() const
1539{
1540 d.data()->compilePattern();
1541 if (d->errorCode) {
1542 QString errorString;
1543 int errorStringLength;
1544 do {
1545 errorString.resize(errorString.size() + 64);
1546 errorStringLength = pcre2_get_error_message_16(d->errorCode,
1547 reinterpret_cast<ushort *>(errorString.data()),
1548 errorString.size());
1549 } while (errorStringLength < 0);
1550 errorString.resize(errorStringLength);
1551
1552#ifdef QT_NO_TRANSLATION
1553 return errorString;
1554#else
1555 return QCoreApplication::translate("QRegularExpression", std::move(errorString).toLatin1().constData());
1556#endif
1557 }
1558#ifdef QT_NO_TRANSLATION
1559 return u"no error"_s;
1560#else
1561 return QCoreApplication::translate("QRegularExpression", "no error");
1562#endif
1563}
1564
1565/*!
1566 Returns the offset, inside the pattern string, at which an error was found
1567 when checking the validity of the regular expression. If no error was
1568 found, then -1 is returned.
1569
1570 \sa pattern(), isValid(), errorString()
1571*/
1572qsizetype QRegularExpression::patternErrorOffset() const
1573{
1574 d.data()->compilePattern();
1575 return d->errorOffset;
1576}
1577
1578/*!
1579 Attempts to match the regular expression against the given \a subject
1580 string, starting at the position \a offset inside the subject, using a
1581 match of type \a matchType and honoring the given \a matchOptions.
1582
1583 The returned QRegularExpressionMatch object contains the results of the
1584 match.
1585
1586 \sa QRegularExpressionMatch, {normal matching}
1587*/
1588QRegularExpressionMatch QRegularExpression::match(const QString &subject,
1589 qsizetype offset,
1590 MatchType matchType,
1591 MatchOptions matchOptions) const
1592{
1593 d.data()->compilePattern();
1594 auto priv = new QRegularExpressionMatchPrivate(*this,
1595 subject,
1596 QStringView(subject),
1597 matchType,
1598 matchOptions);
1599 d->doMatch(priv, offset);
1600 return QRegularExpressionMatch(*priv);
1601}
1602
1603#if QT_DEPRECATED_SINCE(6, 8)
1604/*!
1605 \since 6.0
1606 \overload
1607 \obsolete
1608
1609 Use matchView() instead.
1610*/
1611QRegularExpressionMatch QRegularExpression::match(QStringView subjectView,
1612 qsizetype offset,
1613 MatchType matchType,
1614 MatchOptions matchOptions) const
1615{
1616 return matchView(subjectView, offset, matchType, matchOptions);
1617}
1618#endif // QT_DEPRECATED_SINCE(6, 8)
1619
1620/*!
1621 \since 6.5
1622 \overload
1623
1624 Attempts to match the regular expression against the given \a subjectView
1625 string view, starting at the position \a offset inside the subject, using a
1626 match of type \a matchType and honoring the given \a matchOptions.
1627
1628 The returned QRegularExpressionMatch object contains the results of the
1629 match.
1630
1631 \note The data referenced by \a subjectView must remain valid as long
1632 as there are QRegularExpressionMatch objects using it.
1633
1634 \sa QRegularExpressionMatch, {normal matching}
1635*/
1636QRegularExpressionMatch QRegularExpression::matchView(QStringView subjectView,
1637 qsizetype offset,
1638 MatchType matchType,
1639 MatchOptions matchOptions) const
1640{
1641 d.data()->compilePattern();
1642 auto priv = new QRegularExpressionMatchPrivate(*this,
1643 QString(),
1644 subjectView,
1645 matchType,
1646 matchOptions);
1647 d->doMatch(priv, offset);
1648 return QRegularExpressionMatch(*priv);
1649}
1650
1651/*!
1652 Attempts to perform a global match of the regular expression against the
1653 given \a subject string, starting at the position \a offset inside the
1654 subject, using a match of type \a matchType and honoring the given \a
1655 matchOptions.
1656
1657 The returned QRegularExpressionMatchIterator is positioned before the
1658 first match result (if any).
1659
1660 \sa QRegularExpressionMatchIterator, {global matching}
1661*/
1662QRegularExpressionMatchIterator QRegularExpression::globalMatch(const QString &subject,
1663 qsizetype offset,
1664 MatchType matchType,
1665 MatchOptions matchOptions) const
1666{
1667 QRegularExpressionMatchIteratorPrivate *priv =
1668 new QRegularExpressionMatchIteratorPrivate(*this,
1669 matchType,
1670 matchOptions,
1671 match(subject, offset, matchType, matchOptions));
1672
1673 return QRegularExpressionMatchIterator(*priv);
1674}
1675
1676#if QT_DEPRECATED_SINCE(6, 8)
1677/*!
1678 \since 6.0
1679 \overload
1680 \obsolete
1681
1682 Use globalMatchView() instead.
1683*/
1684QRegularExpressionMatchIterator QRegularExpression::globalMatch(QStringView subjectView,
1685 qsizetype offset,
1686 MatchType matchType,
1687 MatchOptions matchOptions) const
1688{
1689 return globalMatchView(subjectView, offset, matchType, matchOptions);
1690}
1691#endif // QT_DEPRECATED_SINCE(6, 8)
1692
1693/*!
1694 \since 6.5
1695 \overload
1696
1697 Attempts to perform a global match of the regular expression against the
1698 given \a subjectView string view, starting at the position \a offset inside the
1699 subject, using a match of type \a matchType and honoring the given \a
1700 matchOptions.
1701
1702 The returned QRegularExpressionMatchIterator is positioned before the
1703 first match result (if any).
1704
1705 \note The data referenced by \a subjectView must remain valid as
1706 long as there are QRegularExpressionMatchIterator or
1707 QRegularExpressionMatch objects using it.
1708
1709 \sa QRegularExpressionMatchIterator, {global matching}
1710*/
1711QRegularExpressionMatchIterator QRegularExpression::globalMatchView(QStringView subjectView,
1712 qsizetype offset,
1713 MatchType matchType,
1714 MatchOptions matchOptions) const
1715{
1716 QRegularExpressionMatchIteratorPrivate *priv =
1717 new QRegularExpressionMatchIteratorPrivate(*this,
1718 matchType,
1719 matchOptions,
1720 matchView(subjectView, offset, matchType, matchOptions));
1721
1722 return QRegularExpressionMatchIterator(*priv);
1723}
1724
1725/*!
1726 \since 5.4
1727
1728 Compiles the pattern immediately, including JIT compiling it (if
1729 the JIT is enabled) for optimization.
1730
1731 \sa isValid(), {Debugging Code that Uses QRegularExpression}
1732*/
1733void QRegularExpression::optimize() const
1734{
1735 d.data()->compilePattern();
1736}
1737
1738/*!
1739 \fn bool QRegularExpression::operator==(const QRegularExpression &lhs, const QRegularExpression &rhs) noexcept
1740
1741 Returns \c true if the \a lhs regular expression is equal to the \a rhs, or false
1742 otherwise. Two QRegularExpression objects are equal if they have
1743 the same pattern string and the same pattern options.
1744
1745 \sa operator!=()
1746*/
1747bool comparesEqual(const QRegularExpression &lhs,
1748 const QRegularExpression &rhs) noexcept
1749{
1750 return (lhs.d == rhs.d) ||
1751 (lhs.d->pattern == rhs.d->pattern && lhs.d->patternOptions == rhs.d->patternOptions);
1752}
1753/*!
1754 \fn QRegularExpression & QRegularExpression::operator=(QRegularExpression && re)
1755
1756 Move-assigns the regular expression \a re to this object, and returns a
1757 reference to the result. Both the pattern and the pattern options are copied.
1758
1759 Note that a moved-from QRegularExpression can only be destroyed or
1760 assigned to. The effect of calling other functions than the destructor
1761 or one of the assignment operators is undefined.
1762*/
1763
1764/*!
1765 \fn bool QRegularExpression::operator!=(const QRegularExpression &lhs, const QRegularExpression &rhs) noexcept
1766
1767 Returns \c true if the \a lhs regular expression is different from the \a rhs, or
1768 false otherwise.
1769
1770 \sa operator==()
1771*/
1772
1773/*!
1774 \since 5.6
1775 \relates QRegularExpression
1776
1777 Returns the hash value for \a key, using
1778 \a seed to seed the calculation.
1779*/
1780size_t qHash(const QRegularExpression &key, size_t seed) noexcept
1781{
1782 return qHashMulti(seed, key.d->pattern, key.d->patternOptions);
1783}
1784
1785/*!
1786 \fn QString QRegularExpression::escape(const QString &str)
1787 \overload
1788*/
1789
1790/*!
1791 \since 5.15
1792
1793 Escapes all characters of \a str so that they no longer have any special
1794 meaning when used as a regular expression pattern string, and returns
1795 the escaped string. For instance:
1796
1797 \snippet code/src_corelib_text_qregularexpression.cpp 26
1798
1799 This is very convenient in order to build patterns from arbitrary strings:
1800
1801 \snippet code/src_corelib_text_qregularexpression.cpp 27
1802
1803 \note This function implements Perl's quotemeta algorithm and escapes with
1804 a backslash all characters in \a str, except for the characters in the
1805 \c{[A-Z]}, \c{[a-z]} and \c{[0-9]} ranges, as well as the underscore
1806 (\c{_}) character. The only difference with Perl is that a literal NUL
1807 inside \a str is escaped with the sequence \c{"\\0"} (backslash +
1808 \c{'0'}), instead of \c{"\\\0"} (backslash + \c{NUL}).
1809*/
1810QString QRegularExpression::escape(QStringView str)
1811{
1812 QString result;
1813 const qsizetype count = str.size();
1814 result.reserve(count * 2);
1815
1816 // everything but [a-zA-Z0-9_] gets escaped,
1817 // cf. perldoc -f quotemeta
1818 for (qsizetype i = 0; i < count; ++i) {
1819 const QChar current = str.at(i);
1820
1821 if (current == QChar::Null) {
1822 // unlike Perl, a literal NUL must be escaped with
1823 // "\\0" (backslash + 0) and not "\\\0" (backslash + NUL),
1824 // because pcre16_compile uses a NUL-terminated string
1825 result.append(u'\\');
1826 result.append(u'0');
1827 } else if ((current < u'a' || current > u'z') &&
1828 (current < u'A' || current > u'Z') &&
1829 (current < u'0' || current > u'9') &&
1830 current != u'_') {
1831 result.append(u'\\');
1832 result.append(current);
1833 if (current.isHighSurrogate() && i < (count - 1))
1834 result.append(str.at(++i));
1835 } else {
1836 result.append(current);
1837 }
1838 }
1839
1840 result.squeeze();
1841 return result;
1842}
1843
1844/*!
1845 \since 5.12
1846 \fn QString QRegularExpression::wildcardToRegularExpression(const QString &pattern, WildcardConversionOptions options)
1847 \overload
1848*/
1849
1850/*!
1851 \since 6.0
1852 \enum QRegularExpression::WildcardConversionOption
1853
1854 The WildcardConversionOption enum defines modifiers to the way a wildcard glob
1855 pattern gets converted to a regular expression pattern.
1856
1857 \value DefaultWildcardConversion
1858 No conversion options are set.
1859
1860 \value UnanchoredWildcardConversion
1861 The conversion will not anchor the pattern. This allows for partial string matches of
1862 wildcard expressions.
1863
1864 \value [since 6.6] NonPathWildcardConversion
1865 The conversion will \e{not} interpret the pattern as filepath globbing.
1866
1867 \sa QRegularExpression::wildcardToRegularExpression
1868*/
1869
1870/*!
1871 \since 5.15
1872
1873 Returns a regular expression representation of the given glob \a pattern.
1874
1875 There are two transformations possible, one that targets file path
1876 globbing, and another one which is more generic.
1877
1878 By default, the transformation is targeting file path globbing,
1879 which means in particular that path separators receive special
1880 treatment. This implies that it is not just a basic translation
1881 from "*" to ".*" and similar.
1882
1883 \snippet code/src_corelib_text_qregularexpression.cpp 31
1884
1885 The more generic globbing transformation is available by passing
1886 \c NonPathWildcardConversion in the conversion \a options.
1887
1888 This implementation follows closely the definition
1889 of wildcard for glob patterns:
1890 \table
1891 \row \li \b{c}
1892 \li Any character represents itself apart from those mentioned
1893 below. Thus \b{c} matches the character \e c.
1894 \row \li \b{?}
1895 \li Matches any single character, except for a path separator
1896 (in case file path globbing has been selected). It is the
1897 same as b{.} in full regexps.
1898 \row \li \b{*}
1899 \li Matches zero or more of any characters, except for path
1900 separators (in case file path globbing has been selected). It is the
1901 same as \b{.*} in full regexps.
1902 \row \li \b{[abc]}
1903 \li Matches one character given in the bracket.
1904 \row \li \b{[a-c]}
1905 \li Matches one character from the range given in the bracket.
1906 \row \li \b{[!abc]}
1907 \li Matches one character that is not given in the bracket. It is the
1908 same as \b{[^abc]} in full regexp.
1909 \row \li \b{[!a-c]}
1910 \li Matches one character that is not from the range given in the
1911 bracket. It is the same as \b{[^a-c]} in full regexp.
1912 \endtable
1913
1914 \note For historical reasons, a backslash (\\‍) character is \e not
1915 an escape char in this context. In order to match one of the
1916 special characters, place it in square brackets (for example,
1917 \c{[?]}).
1918
1919 More information about the implementation can be found in:
1920 \list
1921 \li \l {https://en.wikipedia.org/wiki/Glob_(programming)} {The Wikipedia Glob article}
1922 \li \c {man 7 glob}
1923 \endlist
1924
1925 By default, the returned regular expression is fully anchored. In other
1926 words, there is no need of calling anchoredPattern() again on the
1927 result. To get a regular expression that is not anchored, pass
1928 UnanchoredWildcardConversion in the conversion \a options.
1929
1930 \sa escape()
1931*/
1932QString QRegularExpression::wildcardToRegularExpression(QStringView pattern, WildcardConversionOptions options)
1933{
1934 const qsizetype wclen = pattern.size();
1935 QString rx;
1936 rx.reserve(wclen + wclen / 16);
1937 qsizetype i = 0;
1938 const QChar *wc = pattern.data();
1939
1940 struct GlobSettings {
1941 char16_t nativePathSeparator;
1942 QStringView starEscape;
1943 QStringView questionMarkEscape;
1944 };
1945
1946 const GlobSettings settings = [options]() {
1947 if (options.testFlag(NonPathWildcardConversion)) {
1948 return GlobSettings{ u'\0', u".*", u"." };
1949 } else {
1950#ifdef Q_OS_WIN
1951 return GlobSettings{ u'\\', u"[^/\\\\]*", u"[^/\\\\]" };
1952#else
1953 return GlobSettings{ u'/', u"[^/]*", u"[^/]" };
1954#endif
1955 }
1956 }();
1957
1958 // We want a dot to match everything (incl. newlines), so enable /s mode,
1959 // limited to the pattern string we're producing.
1960 rx += u"(?s:";
1961
1962 while (i < wclen) {
1963 const QChar c = wc[i++];
1964 switch (c.unicode()) {
1965 case '*':
1966 rx += settings.starEscape;
1967 // Coalesce sequences of *
1968 while (i < wclen && wc[i] == u'*')
1969 ++i;
1970 break;
1971 case '?':
1972 rx += settings.questionMarkEscape;
1973 break;
1974 // When not using filepath globbing: \ is escaped, / is itself
1975 // When using filepath globbing:
1976 // * Unix: \ gets escaped. / is itself
1977 // * Windows: \ and / can match each other -- they become [/\\] in regexp
1978 case '\\':
1979#ifdef Q_OS_WIN
1980 if (options.testFlag(NonPathWildcardConversion))
1981 rx += u"\\\\";
1982 else
1983 rx += u"[/\\\\]";
1984 break;
1985 case '/':
1986 if (options.testFlag(NonPathWildcardConversion))
1987 rx += u'/';
1988 else
1989 rx += u"[/\\\\]";
1990 break;
1991#endif
1992 case '$':
1993 case '(':
1994 case ')':
1995 case '+':
1996 case '.':
1997 case '^':
1998 case '{':
1999 case '|':
2000 case '}':
2001 rx += u'\\';
2002 rx += c;
2003 break;
2004 case '[':
2005 rx += c;
2006 // Support for the [!abc] or [!a-c] syntax
2007 if (i < wclen) {
2008 if (wc[i] == u'!') {
2009 rx += u'^';
2010 ++i;
2011 }
2012
2013 if (i < wclen && wc[i] == u']')
2014 rx += wc[i++];
2015
2016 while (i < wclen && wc[i] != u']') {
2017 if (!options.testFlag(NonPathWildcardConversion)) {
2018 // The '/' appearing in a character class invalidates the
2019 // regular expression parsing. It also concerns '\\' on
2020 // Windows OS types.
2021 if (wc[i] == u'/' || wc[i] == settings.nativePathSeparator)
2022 return rx;
2023 }
2024 if (wc[i] == u'\\')
2025 rx += u'\\';
2026 rx += wc[i++];
2027 }
2028 }
2029 break;
2030 default:
2031 rx += c;
2032 break;
2033 }
2034 }
2035
2036 // Closes the (?s: group opened above
2037 rx += u")";
2038
2039 if (!(options & UnanchoredWildcardConversion))
2040 rx = anchoredPattern(rx);
2041
2042 return rx;
2043}
2044
2045/*!
2046 \since 6.0
2047 Returns a regular expression of the glob pattern \a pattern. The regular expression
2048 will be case sensitive if \a cs is \l{Qt::CaseSensitive}, and converted according to
2049 \a options.
2050
2051 Equivalent to
2052 \code
2053 auto reOptions = cs == Qt::CaseSensitive ? QRegularExpression::NoPatternOption :
2054 QRegularExpression::CaseInsensitiveOption;
2055 return QRegularExpression(wildcardToRegularExpression(str, options), reOptions);
2056 \endcode
2057*/
2058QRegularExpression QRegularExpression::fromWildcard(QStringView pattern, Qt::CaseSensitivity cs,
2059 WildcardConversionOptions options)
2060{
2061 auto reOptions = cs == Qt::CaseSensitive ? QRegularExpression::NoPatternOption :
2062 QRegularExpression::CaseInsensitiveOption;
2063 return QRegularExpression(wildcardToRegularExpression(pattern, options), reOptions);
2064}
2065
2066/*!
2067 \fn QRegularExpression::anchoredPattern(const QString &expression)
2068 \since 5.12
2069 \overload
2070*/
2071
2072/*!
2073 \since 5.15
2074
2075 Returns the \a expression wrapped between the \c{\A} and \c{\z} anchors to
2076 be used for exact matching.
2077*/
2078QString QRegularExpression::anchoredPattern(QStringView expression)
2079{
2080 return QString()
2081 + "\\A(?:"_L1
2082 + expression
2083 + ")\\z"_L1;
2084}
2085
2086/*!
2087 \since 5.1
2088
2089 Constructs a valid, empty QRegularExpressionMatch object. The regular
2090 expression is set to a default-constructed one; the match type to
2091 QRegularExpression::NoMatch and the match options to
2092 QRegularExpression::NoMatchOption.
2093
2094 The object will report no match through the hasMatch() and the
2095 hasPartialMatch() member functions.
2096*/
2097QRegularExpressionMatch::QRegularExpressionMatch()
2098 : d(new QRegularExpressionMatchPrivate(QRegularExpression(),
2099 QString(),
2100 QStringView(),
2101 QRegularExpression::NoMatch,
2102 QRegularExpression::NoMatchOption))
2103{
2104 d->isValid = true;
2105}
2106
2107/*!
2108 Destroys the match result.
2109*/
2110QRegularExpressionMatch::~QRegularExpressionMatch()
2111{
2112}
2113
2114QT_DEFINE_QESDP_SPECIALIZATION_DTOR(QRegularExpressionMatchPrivate)
2115
2116/*!
2117 Constructs a match result by copying the result of the given \a match.
2118
2119 \sa operator=()
2120*/
2121QRegularExpressionMatch::QRegularExpressionMatch(const QRegularExpressionMatch &match)
2122 : d(match.d)
2123{
2124}
2125
2126/*!
2127 \fn QRegularExpressionMatch::QRegularExpressionMatch(QRegularExpressionMatch &&match)
2128
2129 \since 6.1
2130
2131 Constructs a match result by moving the result from the given \a match.
2132
2133 Note that a moved-from QRegularExpressionMatch can only be destroyed or
2134 assigned to. The effect of calling other functions than the destructor
2135 or one of the assignment operators is undefined.
2136
2137 \sa operator=()
2138*/
2139
2140/*!
2141 Assigns the match result \a match to this object, and returns a reference
2142 to the copy.
2143*/
2144QRegularExpressionMatch &QRegularExpressionMatch::operator=(const QRegularExpressionMatch &match)
2145{
2146 d = match.d;
2147 return *this;
2148}
2149
2150/*!
2151 \fn QRegularExpressionMatch &QRegularExpressionMatch::operator=(QRegularExpressionMatch &&match)
2152
2153 Move-assigns the match result \a match to this object, and returns a
2154 reference to the result.
2155
2156 Note that a moved-from QRegularExpressionMatch can only be destroyed or
2157 assigned to. The effect of calling other functions than the destructor
2158 or one of the assignment operators is undefined.
2159*/
2160
2161/*!
2162 \fn void QRegularExpressionMatch::swap(QRegularExpressionMatch &other)
2163 \memberswap{match result}
2164*/
2165
2166/*!
2167 \internal
2168*/
2169QRegularExpressionMatch::QRegularExpressionMatch(QRegularExpressionMatchPrivate &dd)
2170 : d(&dd)
2171{
2172}
2173
2174/*!
2175 Returns the QRegularExpression object whose match() function returned this
2176 object.
2177
2178 \sa QRegularExpression::match(), matchType(), matchOptions()
2179*/
2180QRegularExpression QRegularExpressionMatch::regularExpression() const
2181{
2182 return d->regularExpression;
2183}
2184
2185
2186/*!
2187 Returns the match type that was used to get this QRegularExpressionMatch
2188 object, that is, the match type that was passed to
2189 QRegularExpression::match() or QRegularExpression::globalMatch().
2190
2191 \sa QRegularExpression::match(), regularExpression(), matchOptions()
2192*/
2193QRegularExpression::MatchType QRegularExpressionMatch::matchType() const
2194{
2195 return d->matchType;
2196}
2197
2198/*!
2199 Returns the match options that were used to get this
2200 QRegularExpressionMatch object, that is, the match options that were passed
2201 to QRegularExpression::match() or QRegularExpression::globalMatch().
2202
2203 \sa QRegularExpression::match(), regularExpression(), matchType()
2204*/
2205QRegularExpression::MatchOptions QRegularExpressionMatch::matchOptions() const
2206{
2207 return d->matchOptions;
2208}
2209
2210/*!
2211 Returns the index of the last capturing group that captured something,
2212 including the implicit capturing group 0. This can be used to extract all
2213 the substrings that were captured:
2214
2215 \snippet code/src_corelib_text_qregularexpression.cpp 28
2216
2217 Note that some of the capturing groups with an index less than
2218 lastCapturedIndex() could have not matched, and therefore captured nothing.
2219
2220 If the regular expression did not match, this function returns -1.
2221
2222 \sa hasCaptured(), captured(), capturedStart(), capturedEnd(), capturedLength()
2223*/
2224int QRegularExpressionMatch::lastCapturedIndex() const
2225{
2226 return d->capturedCount - 1;
2227}
2228
2229/*!
2230 \fn bool QRegularExpressionMatch::hasCaptured(QAnyStringView name) const
2231 \since 6.3
2232
2233 Returns true if the capturing group named \a name captured something
2234 in the subject string, and false otherwise (or if there is no
2235 capturing group called \a name).
2236
2237 \note Some capturing groups in a regular expression may not have
2238 captured anything even if the regular expression matched. This may
2239 happen, for instance, if a conditional operator is used in the
2240 pattern:
2241
2242 \snippet code/src_corelib_text_qregularexpression.cpp 36
2243
2244 Similarly, a capturing group may capture a substring of length 0;
2245 this function will return \c{true} for such a capturing group.
2246
2247 \note In Qt versions prior to 6.8, this function took QString or
2248 QStringView, not QAnyStringView.
2249
2250 \sa captured(), hasMatch()
2251*/
2252bool QRegularExpressionMatch::hasCaptured(QAnyStringView name) const
2253{
2254 const int nth = d->regularExpression.d->captureIndexForName(name);
2255 return hasCaptured(nth);
2256}
2257
2258/*!
2259 \since 6.3
2260
2261 Returns true if the \a nth capturing group captured something
2262 in the subject string, and false otherwise (or if there is no
2263 such capturing group).
2264
2265 \note The implicit capturing group number 0 captures the substring
2266 matched by the entire pattern.
2267
2268 \note Some capturing groups in a regular expression may not have
2269 captured anything even if the regular expression matched. This may
2270 happen, for instance, if a conditional operator is used in the
2271 pattern:
2272
2273 \snippet code/src_corelib_text_qregularexpression.cpp 36
2274
2275 Similarly, a capturing group may capture a substring of length 0;
2276 this function will return \c{true} for such a capturing group.
2277
2278 \sa captured(), lastCapturedIndex(), hasMatch()
2279*/
2280bool QRegularExpressionMatch::hasCaptured(int nth) const
2281{
2282 if (nth < 0 || nth > lastCapturedIndex())
2283 return false;
2284
2285 return d->capturedOffsets.at(nth * 2) != -1;
2286}
2287
2288/*!
2289 Returns the substring captured by the \a nth capturing group.
2290
2291 If the \a nth capturing group did not capture a string, or if there is no
2292 such capturing group, returns a null QString.
2293
2294 \note The implicit capturing group number 0 captures the substring matched
2295 by the entire pattern.
2296
2297 \sa capturedView(), lastCapturedIndex(), capturedStart(), capturedEnd(),
2298 capturedLength(), QString::isNull()
2299*/
2300QString QRegularExpressionMatch::captured(int nth) const
2301{
2302 return capturedView(nth).toString();
2303}
2304
2305/*!
2306 \since 5.10
2307
2308 Returns a view of the substring captured by the \a nth capturing group.
2309
2310 If the \a nth capturing group did not capture a string, or if there is no
2311 such capturing group, returns a null QStringView.
2312
2313 \note The implicit capturing group number 0 captures the substring matched
2314 by the entire pattern.
2315
2316 \sa captured(), lastCapturedIndex(), capturedStart(), capturedEnd(),
2317 capturedLength(), QStringView::isNull()
2318*/
2319QStringView QRegularExpressionMatch::capturedView(int nth) const
2320{
2321 if (!hasCaptured(nth))
2322 return QStringView();
2323
2324 qsizetype start = capturedStart(nth);
2325
2326 if (start == -1) // didn't capture
2327 return QStringView();
2328
2329 return d->subject.mid(start, capturedLength(nth));
2330}
2331
2332/*!
2333 \since 5.10
2334
2335 Returns the substring captured by the capturing group named \a name.
2336
2337 If the named capturing group \a name did not capture a string, or if
2338 there is no capturing group named \a name, returns a null QString.
2339
2340 \note In Qt versions prior to 6.8, this function took QString or
2341 QStringView, not QAnyStringView.
2342
2343 \sa capturedView(), capturedStart(), capturedEnd(), capturedLength(),
2344 QString::isNull()
2345*/
2346QString QRegularExpressionMatch::captured(QAnyStringView name) const
2347{
2348 if (name.isEmpty()) {
2349 qWarning("QRegularExpressionMatch::captured: empty capturing group name passed");
2350 return QString();
2351 }
2352
2353 return capturedView(name).toString();
2354}
2355
2356/*!
2357 \since 5.10
2358
2359 Returns a view of the string captured by the capturing group named \a
2360 name.
2361
2362 If the named capturing group \a name did not capture a string, or if
2363 there is no capturing group named \a name, returns a null QStringView.
2364
2365 \note In Qt versions prior to 6.8, this function took QString or
2366 QStringView, not QAnyStringView.
2367
2368 \sa captured(), capturedStart(), capturedEnd(), capturedLength(),
2369 QStringView::isNull()
2370*/
2371QStringView QRegularExpressionMatch::capturedView(QAnyStringView name) const
2372{
2373 if (name.isEmpty()) {
2374 qWarning("QRegularExpressionMatch::capturedView: empty capturing group name passed");
2375 return QStringView();
2376 }
2377 int nth = d->regularExpression.d->captureIndexForName(name);
2378 if (nth == -1)
2379 return QStringView();
2380 return capturedView(nth);
2381}
2382
2383/*!
2384 Returns a list of all strings captured by capturing groups, in the order
2385 the groups themselves appear in the pattern string. The list includes the
2386 implicit capturing group number 0, capturing the substring matched by the
2387 entire pattern.
2388*/
2389QStringList QRegularExpressionMatch::capturedTexts() const
2390{
2391 QStringList texts;
2392 texts.reserve(d->capturedCount);
2393 for (int i = 0; i < d->capturedCount; ++i)
2394 texts << captured(i);
2395 return texts;
2396}
2397
2398/*!
2399 Returns the offset inside the subject string corresponding to the
2400 starting position of the substring captured by the \a nth capturing group.
2401 If the \a nth capturing group did not capture a string or doesn't exist,
2402 returns -1.
2403
2404 \sa capturedEnd(), capturedLength(), captured()
2405*/
2406qsizetype QRegularExpressionMatch::capturedStart(int nth) const
2407{
2408 if (!hasCaptured(nth))
2409 return -1;
2410
2411 return d->capturedOffsets.at(nth * 2);
2412}
2413
2414/*!
2415 Returns the length of the substring captured by the \a nth capturing group.
2416
2417 \note This function returns 0 if the \a nth capturing group did not capture
2418 a string or doesn't exist.
2419
2420 \sa capturedStart(), capturedEnd(), captured()
2421*/
2422qsizetype QRegularExpressionMatch::capturedLength(int nth) const
2423{
2424 // bound checking performed by these two functions
2425 return capturedEnd(nth) - capturedStart(nth);
2426}
2427
2428/*!
2429 Returns the offset inside the subject string immediately after the ending
2430 position of the substring captured by the \a nth capturing group. If the \a
2431 nth capturing group did not capture a string or doesn't exist, returns -1.
2432
2433 \sa capturedStart(), capturedLength(), captured()
2434*/
2435qsizetype QRegularExpressionMatch::capturedEnd(int nth) const
2436{
2437 if (!hasCaptured(nth))
2438 return -1;
2439
2440 return d->capturedOffsets.at(nth * 2 + 1);
2441}
2442
2443/*!
2444 \since 5.10
2445
2446 Returns the offset inside the subject string corresponding to the starting
2447 position of the substring captured by the capturing group named \a name.
2448 If the capturing group named \a name did not capture a string or doesn't
2449 exist, returns -1.
2450
2451 \note In Qt versions prior to 6.8, this function took QString or
2452 QStringView, not QAnyStringView.
2453
2454 \sa capturedEnd(), capturedLength(), captured()
2455*/
2456qsizetype QRegularExpressionMatch::capturedStart(QAnyStringView name) const
2457{
2458 if (name.isEmpty()) {
2459 qWarning("QRegularExpressionMatch::capturedStart: empty capturing group name passed");
2460 return -1;
2461 }
2462 int nth = d->regularExpression.d->captureIndexForName(name);
2463 if (nth == -1)
2464 return -1;
2465 return capturedStart(nth);
2466}
2467
2468/*!
2469 \since 5.10
2470
2471 Returns the length of the substring captured by the capturing group named
2472 \a name.
2473
2474 \note This function returns 0 if the capturing group named \a name did not
2475 capture a string or doesn't exist.
2476
2477 \note In Qt versions prior to 6.8, this function took QString or
2478 QStringView, not QAnyStringView.
2479
2480 \sa capturedStart(), capturedEnd(), captured()
2481*/
2482qsizetype QRegularExpressionMatch::capturedLength(QAnyStringView name) const
2483{
2484 if (name.isEmpty()) {
2485 qWarning("QRegularExpressionMatch::capturedLength: empty capturing group name passed");
2486 return 0;
2487 }
2488 int nth = d->regularExpression.d->captureIndexForName(name);
2489 if (nth == -1)
2490 return 0;
2491 return capturedLength(nth);
2492}
2493
2494/*!
2495 \since 5.10
2496
2497 Returns the offset inside the subject string immediately after the ending
2498 position of the substring captured by the capturing group named \a name. If
2499 the capturing group named \a name did not capture a string or doesn't
2500 exist, returns -1.
2501
2502 \note In Qt versions prior to 6.8, this function took QString or
2503 QStringView, not QAnyStringView.
2504
2505 \sa capturedStart(), capturedLength(), captured()
2506*/
2507qsizetype QRegularExpressionMatch::capturedEnd(QAnyStringView name) const
2508{
2509 if (name.isEmpty()) {
2510 qWarning("QRegularExpressionMatch::capturedEnd: empty capturing group name passed");
2511 return -1;
2512 }
2513 int nth = d->regularExpression.d->captureIndexForName(name);
2514 if (nth == -1)
2515 return -1;
2516 return capturedEnd(nth);
2517}
2518
2519/*!
2520 Returns \c true if the regular expression matched against the subject string,
2521 or false otherwise.
2522
2523 \sa QRegularExpression::match(), hasPartialMatch()
2524*/
2525bool QRegularExpressionMatch::hasMatch() const
2526{
2527 return d->hasMatch;
2528}
2529
2530/*!
2531 Returns \c true if the regular expression partially matched against the
2532 subject string, or false otherwise.
2533
2534 \note Only a match that explicitly used the one of the partial match types
2535 can yield a partial match. Still, if such a match succeeds totally, this
2536 function will return false, while hasMatch() will return true.
2537
2538 \sa QRegularExpression::match(), QRegularExpression::MatchType, hasMatch()
2539*/
2540bool QRegularExpressionMatch::hasPartialMatch() const
2541{
2542 return d->hasPartialMatch;
2543}
2544
2545/*!
2546 Returns \c true if the match object was obtained as a result from the
2547 QRegularExpression::match() function invoked on a valid QRegularExpression
2548 object; returns \c false if the QRegularExpression was invalid.
2549
2550 \sa QRegularExpression::match(), QRegularExpression::isValid()
2551*/
2552bool QRegularExpressionMatch::isValid() const
2553{
2554 return d->isValid;
2555}
2556
2557/*!
2558 \internal
2559*/
2560QRegularExpressionMatchIterator::QRegularExpressionMatchIterator(QRegularExpressionMatchIteratorPrivate &dd)
2561 : d(&dd)
2562{
2563}
2564
2565/*!
2566 \since 5.1
2567
2568 Constructs an empty, valid QRegularExpressionMatchIterator object. The
2569 regular expression is set to a default-constructed one; the match type to
2570 QRegularExpression::NoMatch and the match options to
2571 QRegularExpression::NoMatchOption.
2572
2573 Invoking the hasNext() member function on the constructed object will
2574 return false, as the iterator is not iterating on a valid sequence of
2575 matches.
2576*/
2577QRegularExpressionMatchIterator::QRegularExpressionMatchIterator()
2578 : d(new QRegularExpressionMatchIteratorPrivate(QRegularExpression(),
2579 QRegularExpression::NoMatch,
2580 QRegularExpression::NoMatchOption,
2581 QRegularExpressionMatch()))
2582{
2583}
2584
2585/*!
2586 Destroys the QRegularExpressionMatchIterator object.
2587*/
2588QRegularExpressionMatchIterator::~QRegularExpressionMatchIterator()
2589{
2590}
2591
2592QT_DEFINE_QESDP_SPECIALIZATION_DTOR(QRegularExpressionMatchIteratorPrivate)
2593
2594/*!
2595 Constructs a QRegularExpressionMatchIterator object as a copy of \a
2596 iterator.
2597
2598 \sa operator=()
2599*/
2600QRegularExpressionMatchIterator::QRegularExpressionMatchIterator(const QRegularExpressionMatchIterator &iterator)
2601 : d(iterator.d)
2602{
2603}
2604
2605/*!
2606 \fn QRegularExpressionMatchIterator::QRegularExpressionMatchIterator(QRegularExpressionMatchIterator &&iterator)
2607
2608 \since 6.1
2609
2610 Constructs a QRegularExpressionMatchIterator object by moving from \a iterator.
2611
2612 Note that a moved-from QRegularExpressionMatchIterator can only be destroyed
2613 or assigned to. The effect of calling other functions than the destructor
2614 or one of the assignment operators is undefined.
2615
2616 \sa operator=()
2617*/
2618
2619/*!
2620 Assigns the iterator \a iterator to this object, and returns a reference to
2621 the copy.
2622*/
2623QRegularExpressionMatchIterator &QRegularExpressionMatchIterator::operator=(const QRegularExpressionMatchIterator &iterator)
2624{
2625 d = iterator.d;
2626 return *this;
2627}
2628
2629/*!
2630 \fn QRegularExpressionMatchIterator &QRegularExpressionMatchIterator::operator=(QRegularExpressionMatchIterator &&iterator)
2631
2632 Move-assigns the \a iterator to this object, and returns a reference to the
2633 result.
2634
2635 Note that a moved-from QRegularExpressionMatchIterator can only be destroyed
2636 or assigned to. The effect of calling other functions than the destructor
2637 or one of the assignment operators is undefined.
2638*/
2639
2640/*!
2641 \fn void QRegularExpressionMatchIterator::swap(QRegularExpressionMatchIterator &other)
2642 \memberswap{iterator}
2643*/
2644
2645/*!
2646 Returns \c true if the iterator object was obtained as a result from the
2647 QRegularExpression::globalMatch() function invoked on a valid
2648 QRegularExpression object; returns \c false if the QRegularExpression was
2649 invalid.
2650
2651 \sa QRegularExpression::globalMatch(), QRegularExpression::isValid()
2652*/
2653bool QRegularExpressionMatchIterator::isValid() const
2654{
2655 return d->next.isValid();
2656}
2657
2658/*!
2659 Returns \c true if there is at least one match result ahead of the iterator;
2660 otherwise it returns \c false.
2661
2662 \sa next()
2663*/
2664bool QRegularExpressionMatchIterator::hasNext() const
2665{
2666 return d->hasNext();
2667}
2668
2669/*!
2670 Returns the next match result without moving the iterator.
2671
2672 \note Calling this function when the iterator is at the end of the result
2673 set leads to undefined results.
2674*/
2675QRegularExpressionMatch QRegularExpressionMatchIterator::peekNext() const
2676{
2677 if (!hasNext())
2678 qWarning("QRegularExpressionMatchIterator::peekNext() called on an iterator already at end");
2679
2680 return d->next;
2681}
2682
2683/*!
2684 Returns the next match result and advances the iterator by one position.
2685
2686 \note Calling this function when the iterator is at the end of the result
2687 set leads to undefined results.
2688*/
2689QRegularExpressionMatch QRegularExpressionMatchIterator::next()
2690{
2691 if (!hasNext()) {
2692 qWarning("QRegularExpressionMatchIterator::next() called on an iterator already at end");
2693 return d.constData()->next;
2694 }
2695
2696 d.detach();
2697 return std::exchange(d->next, d->next.d.constData()->nextMatch());
2698}
2699
2700/*!
2701 Returns the QRegularExpression object whose globalMatch() function returned
2702 this object.
2703
2704 \sa QRegularExpression::globalMatch(), matchType(), matchOptions()
2705*/
2706QRegularExpression QRegularExpressionMatchIterator::regularExpression() const
2707{
2708 return d->regularExpression;
2709}
2710
2711/*!
2712 Returns the match type that was used to get this
2713 QRegularExpressionMatchIterator object, that is, the match type that was
2714 passed to QRegularExpression::globalMatch().
2715
2716 \sa QRegularExpression::globalMatch(), regularExpression(), matchOptions()
2717*/
2718QRegularExpression::MatchType QRegularExpressionMatchIterator::matchType() const
2719{
2720 return d->matchType;
2721}
2722
2723/*!
2724 Returns the match options that were used to get this
2725 QRegularExpressionMatchIterator object, that is, the match options that
2726 were passed to QRegularExpression::globalMatch().
2727
2728 \sa QRegularExpression::globalMatch(), regularExpression(), matchType()
2729*/
2730QRegularExpression::MatchOptions QRegularExpressionMatchIterator::matchOptions() const
2731{
2732 return d->matchOptions;
2733}
2734
2735/*!
2736 \internal
2737*/
2742
2743/*!
2744 \fn QtPrivate::QRegularExpressionMatchIteratorRangeBasedForIteratorSentinel end(const QRegularExpressionMatchIterator &)
2745 \internal
2746*/
2747
2748#ifndef QT_NO_DATASTREAM
2749/*!
2750 \relates QRegularExpression
2751
2752 Writes the regular expression \a re to stream \a out.
2753
2754 \sa {Serializing Qt Data Types}
2755*/
2756QDataStream &operator<<(QDataStream &out, const QRegularExpression &re)
2757{
2758 out << re.pattern() << quint32(re.patternOptions().toInt());
2759 return out;
2760}
2761
2762/*!
2763 \relates QRegularExpression
2764
2765 Reads a regular expression from stream \a in into \a re.
2766
2767 \sa {Serializing Qt Data Types}
2768*/
2769QDataStream &operator>>(QDataStream &in, QRegularExpression &re)
2770{
2771 QString pattern;
2772 quint32 patternOptions;
2773 in >> pattern >> patternOptions;
2774 re.setPattern(pattern);
2775 re.setPatternOptions(QRegularExpression::PatternOptions::fromInt(patternOptions));
2776 return in;
2777}
2778#endif
2779
2780#ifndef QT_NO_DEBUG_STREAM
2781/*!
2782 \relates QRegularExpression
2783
2784 Writes the regular expression \a re into the debug object \a debug for
2785 debugging purposes.
2786
2787 \sa {Debugging Techniques}
2788*/
2789QDebug operator<<(QDebug debug, const QRegularExpression &re)
2790{
2791 QDebugStateSaver saver(debug);
2792 debug.nospace() << "QRegularExpression(" << re.pattern() << ", " << re.patternOptions() << ')';
2793 return debug;
2794}
2795
2796/*!
2797 \relates QRegularExpression
2798
2799 Writes the pattern options \a patternOptions into the debug object \a debug
2800 for debugging purposes.
2801
2802 \sa {Debugging Techniques}
2803*/
2804QDebug operator<<(QDebug debug, QRegularExpression::PatternOptions patternOptions)
2805{
2806 QDebugStateSaver saver(debug);
2807 QByteArray flags;
2808
2809 if (patternOptions == QRegularExpression::NoPatternOption) {
2810 flags = "NoPatternOption";
2811 } else {
2812 flags.reserve(200); // worst case...
2813 if (patternOptions & QRegularExpression::CaseInsensitiveOption)
2814 flags.append("CaseInsensitiveOption|");
2815 if (patternOptions & QRegularExpression::DotMatchesEverythingOption)
2816 flags.append("DotMatchesEverythingOption|");
2817 if (patternOptions & QRegularExpression::MultilineOption)
2818 flags.append("MultilineOption|");
2819 if (patternOptions & QRegularExpression::ExtendedPatternSyntaxOption)
2820 flags.append("ExtendedPatternSyntaxOption|");
2821 if (patternOptions & QRegularExpression::InvertedGreedinessOption)
2822 flags.append("InvertedGreedinessOption|");
2823 if (patternOptions & QRegularExpression::DontCaptureOption)
2824 flags.append("DontCaptureOption|");
2825 if (patternOptions & QRegularExpression::UseUnicodePropertiesOption)
2826 flags.append("UseUnicodePropertiesOption|");
2827 flags.chop(1);
2828 }
2829
2830 debug.nospace() << "QRegularExpression::PatternOptions(" << flags << ')';
2831
2832 return debug;
2833}
2834/*!
2835 \relates QRegularExpressionMatch
2836
2837 Writes the match object \a match into the debug object \a debug for
2838 debugging purposes.
2839
2840 \sa {Debugging Techniques}
2841*/
2842QDebug operator<<(QDebug debug, const QRegularExpressionMatch &match)
2843{
2844 QDebugStateSaver saver(debug);
2845 debug.nospace() << "QRegularExpressionMatch(";
2846
2847 if (!match.isValid()) {
2848 debug << "Invalid)";
2849 return debug;
2850 }
2851
2852 debug << "Valid";
2853
2854 if (match.hasMatch()) {
2855 debug << ", has match: ";
2856 for (int i = 0; i <= match.lastCapturedIndex(); ++i) {
2857 debug << i
2858 << ":(" << match.capturedStart(i) << ", " << match.capturedEnd(i)
2859 << ", " << match.captured(i) << ')';
2860 if (i < match.lastCapturedIndex())
2861 debug << ", ";
2862 }
2863 } else if (match.hasPartialMatch()) {
2864 debug << ", has partial match: ("
2865 << match.capturedStart(0) << ", "
2866 << match.capturedEnd(0) << ", "
2867 << match.captured(0) << ')';
2868 } else {
2869 debug << ", no match";
2870 }
2871
2872 debug << ')';
2873
2874 return debug;
2875}
2876#endif
2877
2878// fool lupdate: make it extract those strings for translation, but don't put them
2879// inside Qt -- they're already inside libpcre (cf. man 3 pcreapi, pcre_compile.c).
2880#if 0
2881
2882/* PCRE is a library of functions to support regular expressions whose syntax
2883and semantics are as close as possible to those of the Perl 5 language.
2884
2885 Written by Philip Hazel
2886 Original API code Copyright (c) 1997-2012 University of Cambridge
2887 New API code Copyright (c) 2015 University of Cambridge
2888
2889-----------------------------------------------------------------------------
2890Redistribution and use in source and binary forms, with or without
2891modification, are permitted provided that the following conditions are met:
2892
2893 * Redistributions of source code must retain the above copyright notice,
2894 this list of conditions and the following disclaimer.
2895
2896 * Redistributions in binary form must reproduce the above copyright
2897 notice, this list of conditions and the following disclaimer in the
2898 documentation and/or other materials provided with the distribution.
2899
2900 * Neither the name of the University of Cambridge nor the names of its
2901 contributors may be used to endorse or promote products derived from
2902 this software without specific prior written permission.
2903
2904THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
2905AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
2906IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
2907ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
2908LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
2909CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
2910SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
2911INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
2912CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
2913ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
2914POSSIBILITY OF SUCH DAMAGE.
2915-----------------------------------------------------------------------------
2916*/
2917
2918static const char *pcreCompileErrorCodes[] =
2919{
2920 QT_TRANSLATE_NOOP("QRegularExpression", "no error"),
2921 QT_TRANSLATE_NOOP("QRegularExpression", "\\ at end of pattern"),
2922 QT_TRANSLATE_NOOP("QRegularExpression", "\\c at end of pattern"),
2923 QT_TRANSLATE_NOOP("QRegularExpression", "unrecognized character follows \\"),
2924 QT_TRANSLATE_NOOP("QRegularExpression", "numbers out of order in {} quantifier"),
2925 QT_TRANSLATE_NOOP("QRegularExpression", "number too big in {} quantifier"),
2926 QT_TRANSLATE_NOOP("QRegularExpression", "missing terminating ] for character class"),
2927 QT_TRANSLATE_NOOP("QRegularExpression", "escape sequence is invalid in character class"),
2928 QT_TRANSLATE_NOOP("QRegularExpression", "range out of order in character class"),
2929 QT_TRANSLATE_NOOP("QRegularExpression", "quantifier does not follow a repeatable item"),
2930 QT_TRANSLATE_NOOP("QRegularExpression", "internal error: unexpected repeat"),
2931 QT_TRANSLATE_NOOP("QRegularExpression", "unrecognized character after (? or (?-"),
2932 QT_TRANSLATE_NOOP("QRegularExpression", "POSIX named classes are supported only within a class"),
2933 QT_TRANSLATE_NOOP("QRegularExpression", "POSIX collating elements are not supported"),
2934 QT_TRANSLATE_NOOP("QRegularExpression", "missing closing parenthesis"),
2935 QT_TRANSLATE_NOOP("QRegularExpression", "reference to non-existent subpattern"),
2936 QT_TRANSLATE_NOOP("QRegularExpression", "pattern passed as NULL"),
2937 QT_TRANSLATE_NOOP("QRegularExpression", "unrecognised compile-time option bit(s)"),
2938 QT_TRANSLATE_NOOP("QRegularExpression", "missing ) after (?# comment"),
2939 QT_TRANSLATE_NOOP("QRegularExpression", "parentheses are too deeply nested"),
2940 QT_TRANSLATE_NOOP("QRegularExpression", "regular expression is too large"),
2941 QT_TRANSLATE_NOOP("QRegularExpression", "failed to allocate heap memory"),
2942 QT_TRANSLATE_NOOP("QRegularExpression", "unmatched closing parenthesis"),
2943 QT_TRANSLATE_NOOP("QRegularExpression", "internal error: code overflow"),
2944 QT_TRANSLATE_NOOP("QRegularExpression", "missing closing parenthesis for condition"),
2945 QT_TRANSLATE_NOOP("QRegularExpression", "lookbehind assertion is not fixed length"),
2946 QT_TRANSLATE_NOOP("QRegularExpression", "a relative value of zero is not allowed"),
2947 QT_TRANSLATE_NOOP("QRegularExpression", "conditional subpattern contains more than two branches"),
2948 QT_TRANSLATE_NOOP("QRegularExpression", "assertion expected after (?( or (?(?C)"),
2949 QT_TRANSLATE_NOOP("QRegularExpression", "digit expected after (?+ or (?-"),
2950 QT_TRANSLATE_NOOP("QRegularExpression", "unknown POSIX class name"),
2951 QT_TRANSLATE_NOOP("QRegularExpression", "internal error in pcre2_study(): should not occur"),
2952 QT_TRANSLATE_NOOP("QRegularExpression", "this version of PCRE2 does not have Unicode support"),
2953 QT_TRANSLATE_NOOP("QRegularExpression", "parentheses are too deeply nested (stack check)"),
2954 QT_TRANSLATE_NOOP("QRegularExpression", "character code point value in \\x{} or \\o{} is too large"),
2955 QT_TRANSLATE_NOOP("QRegularExpression", "lookbehind is too complicated"),
2956 QT_TRANSLATE_NOOP("QRegularExpression", "\\C is not allowed in a lookbehind assertion in UTF-" "16" " mode"),
2957 QT_TRANSLATE_NOOP("QRegularExpression", "PCRE2 does not support \\F, \\L, \\l, \\N{name}, \\U, or \\u"),
2958 QT_TRANSLATE_NOOP("QRegularExpression", "number after (?C is greater than 255"),
2959 QT_TRANSLATE_NOOP("QRegularExpression", "closing parenthesis for (?C expected"),
2960 QT_TRANSLATE_NOOP("QRegularExpression", "invalid escape sequence in (*VERB) name"),
2961 QT_TRANSLATE_NOOP("QRegularExpression", "unrecognized character after (?P"),
2962 QT_TRANSLATE_NOOP("QRegularExpression", "syntax error in subpattern name (missing terminator?)"),
2963 QT_TRANSLATE_NOOP("QRegularExpression", "two named subpatterns have the same name (PCRE2_DUPNAMES not set)"),
2964 QT_TRANSLATE_NOOP("QRegularExpression", "subpattern name must start with a non-digit"),
2965 QT_TRANSLATE_NOOP("QRegularExpression", "this version of PCRE2 does not have support for \\P, \\p, or \\X"),
2966 QT_TRANSLATE_NOOP("QRegularExpression", "malformed \\P or \\p sequence"),
2967 QT_TRANSLATE_NOOP("QRegularExpression", "unknown property name after \\P or \\p"),
2968 QT_TRANSLATE_NOOP("QRegularExpression", "subpattern name is too long (maximum " "32" " code units)"),
2969 QT_TRANSLATE_NOOP("QRegularExpression", "too many named subpatterns (maximum " "10000" ")"),
2970 QT_TRANSLATE_NOOP("QRegularExpression", "invalid range in character class"),
2971 QT_TRANSLATE_NOOP("QRegularExpression", "octal value is greater than \\377 in 8-bit non-UTF-8 mode"),
2972 QT_TRANSLATE_NOOP("QRegularExpression", "internal error: overran compiling workspace"),
2973 QT_TRANSLATE_NOOP("QRegularExpression", "internal error: previously-checked referenced subpattern not found"),
2974 QT_TRANSLATE_NOOP("QRegularExpression", "DEFINE subpattern contains more than one branch"),
2975 QT_TRANSLATE_NOOP("QRegularExpression", "missing opening brace after \\o"),
2976 QT_TRANSLATE_NOOP("QRegularExpression", "internal error: unknown newline setting"),
2977 QT_TRANSLATE_NOOP("QRegularExpression", "\\g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number"),
2978 QT_TRANSLATE_NOOP("QRegularExpression", "(?R (recursive pattern call) must be followed by a closing parenthesis"),
2979 QT_TRANSLATE_NOOP("QRegularExpression", "obsolete error (should not occur)"),
2980 QT_TRANSLATE_NOOP("QRegularExpression", "(*VERB) not recognized or malformed"),
2981 QT_TRANSLATE_NOOP("QRegularExpression", "subpattern number is too big"),
2982 QT_TRANSLATE_NOOP("QRegularExpression", "subpattern name expected"),
2983 QT_TRANSLATE_NOOP("QRegularExpression", "internal error: parsed pattern overflow"),
2984 QT_TRANSLATE_NOOP("QRegularExpression", "non-octal character in \\o{} (closing brace missing?)"),
2985 QT_TRANSLATE_NOOP("QRegularExpression", "different names for subpatterns of the same number are not allowed"),
2986 QT_TRANSLATE_NOOP("QRegularExpression", "(*MARK) must have an argument"),
2987 QT_TRANSLATE_NOOP("QRegularExpression", "non-hex character in \\x{} (closing brace missing?)"),
2988 QT_TRANSLATE_NOOP("QRegularExpression", "\\c must be followed by a printable ASCII character"),
2989 QT_TRANSLATE_NOOP("QRegularExpression", "\\c must be followed by a letter or one of [\\]^_?"),
2990 QT_TRANSLATE_NOOP("QRegularExpression", "\\k is not followed by a braced, angle-bracketed, or quoted name"),
2991 QT_TRANSLATE_NOOP("QRegularExpression", "internal error: unknown meta code in check_lookbehinds()"),
2992 QT_TRANSLATE_NOOP("QRegularExpression", "\\N is not supported in a class"),
2993 QT_TRANSLATE_NOOP("QRegularExpression", "callout string is too long"),
2994 QT_TRANSLATE_NOOP("QRegularExpression", "disallowed Unicode code point (>= 0xd800 && <= 0xdfff)"),
2995 QT_TRANSLATE_NOOP("QRegularExpression", "using UTF is disabled by the application"),
2996 QT_TRANSLATE_NOOP("QRegularExpression", "using UCP is disabled by the application"),
2997 QT_TRANSLATE_NOOP("QRegularExpression", "name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)"),
2998 QT_TRANSLATE_NOOP("QRegularExpression", "character code point value in \\u.... sequence is too large"),
2999 QT_TRANSLATE_NOOP("QRegularExpression", "digits missing in \\x{} or \\o{} or \\N{U+}"),
3000 QT_TRANSLATE_NOOP("QRegularExpression", "syntax error or number too big in (?(VERSION condition"),
3001 QT_TRANSLATE_NOOP("QRegularExpression", "internal error: unknown opcode in auto_possessify()"),
3002 QT_TRANSLATE_NOOP("QRegularExpression", "missing terminating delimiter for callout with string argument"),
3003 QT_TRANSLATE_NOOP("QRegularExpression", "unrecognized string delimiter follows (?C"),
3004 QT_TRANSLATE_NOOP("QRegularExpression", "using \\C is disabled by the application"),
3005 QT_TRANSLATE_NOOP("QRegularExpression", "(?| and/or (?J: or (?x: parentheses are too deeply nested"),
3006 QT_TRANSLATE_NOOP("QRegularExpression", "using \\C is disabled in this PCRE2 library"),
3007 QT_TRANSLATE_NOOP("QRegularExpression", "regular expression is too complicated"),
3008 QT_TRANSLATE_NOOP("QRegularExpression", "lookbehind assertion is too long"),
3009 QT_TRANSLATE_NOOP("QRegularExpression", "pattern string is longer than the limit set by the application"),
3010 QT_TRANSLATE_NOOP("QRegularExpression", "internal error: unknown code in parsed pattern"),
3011 QT_TRANSLATE_NOOP("QRegularExpression", "internal error: bad code value in parsed_skip()"),
3012 QT_TRANSLATE_NOOP("QRegularExpression", "PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is not allowed in UTF-16 mode"),
3013 QT_TRANSLATE_NOOP("QRegularExpression", "invalid option bits with PCRE2_LITERAL"),
3014 QT_TRANSLATE_NOOP("QRegularExpression", "\\N{U+dddd} is supported only in Unicode (UTF) mode"),
3015 QT_TRANSLATE_NOOP("QRegularExpression", "invalid hyphen in option setting"),
3016 QT_TRANSLATE_NOOP("QRegularExpression", "(*alpha_assertion) not recognized"),
3017 QT_TRANSLATE_NOOP("QRegularExpression", "script runs require Unicode support, which this version of PCRE2 does not have"),
3018 QT_TRANSLATE_NOOP("QRegularExpression", "too many capturing groups (maximum 65535)"),
3019 QT_TRANSLATE_NOOP("QRegularExpression", "atomic assertion expected after (?( or (?(?C)"),
3020 QT_TRANSLATE_NOOP("QRegularExpression", "no error"),
3021 QT_TRANSLATE_NOOP("QRegularExpression", "no match"),
3022 QT_TRANSLATE_NOOP("QRegularExpression", "partial match"),
3023 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: 1 byte missing at end"),
3024 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: 2 bytes missing at end"),
3025 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: 3 bytes missing at end"),
3026 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: 4 bytes missing at end"),
3027 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: 5 bytes missing at end"),
3028 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: byte 2 top bits not 0x80"),
3029 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: byte 3 top bits not 0x80"),
3030 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: byte 4 top bits not 0x80"),
3031 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: byte 5 top bits not 0x80"),
3032 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: byte 6 top bits not 0x80"),
3033 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: 5-byte character is not allowed (RFC 3629)"),
3034 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: 6-byte character is not allowed (RFC 3629)"),
3035 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: code points greater than 0x10ffff are not defined"),
3036 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: code points 0xd800-0xdfff are not defined"),
3037 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: overlong 2-byte sequence"),
3038 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: overlong 3-byte sequence"),
3039 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: overlong 4-byte sequence"),
3040 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: overlong 5-byte sequence"),
3041 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: overlong 6-byte sequence"),
3042 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: isolated byte with 0x80 bit set"),
3043 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-8 error: illegal byte (0xfe or 0xff)"),
3044 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-16 error: missing low surrogate at end"),
3045 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-16 error: invalid low surrogate"),
3046 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-16 error: isolated low surrogate"),
3047 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-32 error: code points 0xd800-0xdfff are not defined"),
3048 QT_TRANSLATE_NOOP("QRegularExpression", "UTF-32 error: code points greater than 0x10ffff are not defined"),
3049 QT_TRANSLATE_NOOP("QRegularExpression", "bad data value"),
3050 QT_TRANSLATE_NOOP("QRegularExpression", "patterns do not all use the same character tables"),
3051 QT_TRANSLATE_NOOP("QRegularExpression", "magic number missing"),
3052 QT_TRANSLATE_NOOP("QRegularExpression", "pattern compiled in wrong mode: 8/16/32-bit error"),
3053 QT_TRANSLATE_NOOP("QRegularExpression", "bad offset value"),
3054 QT_TRANSLATE_NOOP("QRegularExpression", "bad option value"),
3055 QT_TRANSLATE_NOOP("QRegularExpression", "invalid replacement string"),
3056 QT_TRANSLATE_NOOP("QRegularExpression", "bad offset into UTF string"),
3057 QT_TRANSLATE_NOOP("QRegularExpression", "callout error code"),
3058 QT_TRANSLATE_NOOP("QRegularExpression", "invalid data in workspace for DFA restart"),
3059 QT_TRANSLATE_NOOP("QRegularExpression", "too much recursion for DFA matching"),
3060 QT_TRANSLATE_NOOP("QRegularExpression", "backreference condition or recursion test is not supported for DFA matching"),
3061 QT_TRANSLATE_NOOP("QRegularExpression", "function is not supported for DFA matching"),
3062 QT_TRANSLATE_NOOP("QRegularExpression", "pattern contains an item that is not supported for DFA matching"),
3063 QT_TRANSLATE_NOOP("QRegularExpression", "workspace size exceeded in DFA matching"),
3064 QT_TRANSLATE_NOOP("QRegularExpression", "internal error - pattern overwritten?"),
3065 QT_TRANSLATE_NOOP("QRegularExpression", "bad JIT option"),
3066 QT_TRANSLATE_NOOP("QRegularExpression", "JIT stack limit reached"),
3067 QT_TRANSLATE_NOOP("QRegularExpression", "match limit exceeded"),
3068 QT_TRANSLATE_NOOP("QRegularExpression", "no more memory"),
3069 QT_TRANSLATE_NOOP("QRegularExpression", "unknown substring"),
3070 QT_TRANSLATE_NOOP("QRegularExpression", "non-unique substring name"),
3071 QT_TRANSLATE_NOOP("QRegularExpression", "NULL argument passed"),
3072 QT_TRANSLATE_NOOP("QRegularExpression", "nested recursion at the same subject position"),
3073 QT_TRANSLATE_NOOP("QRegularExpression", "matching depth limit exceeded"),
3074 QT_TRANSLATE_NOOP("QRegularExpression", "requested value is not available"),
3075 QT_TRANSLATE_NOOP("QRegularExpression", "requested value is not set"),
3076 QT_TRANSLATE_NOOP("QRegularExpression", "offset limit set without PCRE2_USE_OFFSET_LIMIT"),
3077 QT_TRANSLATE_NOOP("QRegularExpression", "bad escape sequence in replacement string"),
3078 QT_TRANSLATE_NOOP("QRegularExpression", "expected closing curly bracket in replacement string"),
3079 QT_TRANSLATE_NOOP("QRegularExpression", "bad substitution in replacement string"),
3080 QT_TRANSLATE_NOOP("QRegularExpression", "match with end before start or start moved backwards is not supported"),
3081 QT_TRANSLATE_NOOP("QRegularExpression", "too many replacements (more than INT_MAX)"),
3082 QT_TRANSLATE_NOOP("QRegularExpression", "bad serialized data"),
3083 QT_TRANSLATE_NOOP("QRegularExpression", "heap limit exceeded"),
3084 QT_TRANSLATE_NOOP("QRegularExpression", "invalid syntax"),
3085 QT_TRANSLATE_NOOP("QRegularExpression", "internal error - duplicate substitution match"),
3086 QT_TRANSLATE_NOOP("QRegularExpression", "PCRE2_MATCH_INVALID_UTF is not supported for DFA matching"),
3087 QT_TRANSLATE_NOOP("QRegularExpression", "INTERNAL ERROR: invalid substring offset")
3088};
3089#endif // #if 0
3090
3091QT_END_NAMESPACE
QDebug operator<<(QDebug debug, const QRegularExpressionMatch &match)
Writes the match object match into the debug object debug for debugging purposes.
QDataStream & operator<<(QDataStream &out, const QRegularExpression &re)
Writes the regular expression re to stream out.
QDataStream & operator>>(QDataStream &in, QRegularExpression &re)
Reads a regular expression from stream in into re.
QDebug operator<<(QDebug debug, const QRegularExpression &re)
Writes the regular expression re into the debug object debug for debugging purposes.
QRegularExpressionMatchIteratorRangeBasedForIterator(const QRegularExpressionMatchIterator &iterator)
QDebug operator<<(QDebug debug, QIODevice::OpenMode modes)
static pcre2_jit_stack_16 * qtPcreCallback(void *)
static int convertToPcreOptions(QRegularExpression::PatternOptions patternOptions)
bool comparesEqual(const QRegularExpression &lhs, const QRegularExpression &rhs) noexcept
Q_DECL_COLD_FUNCTION void qtWarnAboutInvalidRegularExpression(const QString &pattern, const char *cls, const char *method)
static bool isJitEnabled()
static int safe_pcre2_match_16(const pcre2_code_16 *code, PCRE2_SPTR16 subject, qsizetype length, qsizetype startOffset, int options, pcre2_match_data_16 *matchData, pcre2_match_context_16 *matchContext)
QtPrivate::QRegularExpressionMatchIteratorRangeBasedForIterator begin(const QRegularExpressionMatchIterator &iterator)
QRegularExpressionMatchIteratorPrivate(const QRegularExpression &re, QRegularExpression::MatchType matchType, QRegularExpression::MatchOptions matchOptions, const QRegularExpressionMatch &next)
const QRegularExpression::MatchOptions matchOptions
const QRegularExpression::MatchType matchType
QRegularExpressionMatchPrivate(const QRegularExpression &re, const QString &subjectStorage, QStringView subject, QRegularExpression::MatchType matchType, QRegularExpression::MatchOptions matchOptions)
const QRegularExpression::MatchType matchType
const QRegularExpression::MatchOptions matchOptions
QRegularExpressionMatch nextMatch() const
const QRegularExpression regularExpression
QRegularExpressionPrivate(const QRegularExpressionPrivate &other)
int captureIndexForName(QAnyStringView name) const
void doMatch(QRegularExpressionMatchPrivate *priv, qsizetype offset, CheckSubjectStringOption checkSubjectStringOption=CheckSubjectString, const QRegularExpressionMatchPrivate *previous=nullptr) const
QRegularExpression::PatternOptions patternOptions