Qt
Internal/Contributor docs for the Qt SDK. Note: These are NOT official API docs; those are found at https://doc.qt.io/
Loading...
Searching...
No Matches
qstring-overview.qdoc
Go to the documentation of this file.
1
// Copyright (C) 2025 The Qt Company Ltd.
2
// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR GFDL-1.3-no-invariants-only
3
4
/*!
5
\group string-processing
6
7
\title Classes for string data
8
9
\section1 Overview
10
11
This page gives an overview over string classes in Qt, in particular the
12
large amount of string containers and how to use them efficiently in
13
performance-critical code.
14
15
The following instructions for efficient use are aimed at experienced
16
developers working on performance-critical code that contains considerable
17
amounts of string processing. This is, for example, a parser or a text file
18
generator. \e {Generally, \l QString can be used everywhere and it will
19
perform fine.} It also provides APIs for handling several encodings (for
20
example \l{QString::fromLatin1()}). For many applications and especially when
21
string-processing plays an insignificant role for performance, \l QString
22
will be a simple and sufficient solution. Some Qt functions return a \l
23
QStringView. It can be converted to a QString with
24
\l{QStringView::toString()} if required.
25
26
\section2 Impactful tips
27
28
The following three rules improve string handling substantially without
29
increasing the complexity too much. Follow these rules to get nearly
30
optimal performance in most cases. The first two rules address encoding of
31
string literals and marking them in source code. The third rule addresses
32
deep copies when using parts of a string.
33
34
\list
35
36
\li All strings that only contain ASCII characters (for example log
37
messages) can be encoded with Latin-1. Use the
38
\l{Qt::Literals::StringLiterals::operator""_L1}{string literal}
39
\c{"foo"_L1}. Without
40
this suffix, string literals in source code are assumed to be UTF-8
41
encoded and processing them will be slower. Generally, try to use the
42
tightest encoding, which is Latin-1 in many cases.
43
44
\li User-visible strings are usually translated and thus passed through the
45
\l {QObject::tr()} function. This function takes a string literal (const char
46
array) and returns a \l QString with UTF-16 encoding as demanded by all UI
47
elements. If the translation infrastructure is not used, you should use
48
UTF-16 encoding throughout the whole application. Use the string literal
49
\c{u"foo"} to create UTF-16 string literals or the Qt specific literal
50
\c{u"foo"_s} to directly create a \l QString.
51
52
\li When processing parts of a \l QString, instead of copying each part
53
into its own \l QString object, create \l QStringView objects instead.
54
These can be converted back to \l QString using
55
\l{QStringView::toString()}, but avoid doing so as much as possible. If
56
functions return \l QStringView, it is most efficient to keep working with
57
this class, if possible. The API is similar to a constant \l QString.
58
59
\endlist
60
61
\section2 Efficient usage
62
63
To use string classes efficiently, one should understand the three concepts
64
of:
65
\list
66
\li Encoding
67
\li Owning and non-owning containers
68
\li Literals
69
\endlist
70
71
\section3 Encoding
72
73
Encoding-wise Qt supports UTF-16, UTF-8, Latin-1 (ISO 8859-1) and US-ASCII
74
(that is the common subset of Latin-1 and UTF-8) in one form or another.
75
\list
76
\li Latin-1 is a character encoding that uses a single byte per character
77
which makes it the most efficient but also limited encoding.
78
\li UTF-8 is a variable-length character encoding that encodes all
79
characters using one to four bytes. It is backwards compatible to
80
US-ASCII and it is the common encoding for source code and similar
81
files. Qt assumes that source code is encoded in UTF-8.
82
\li UTF-16 is a variable-length encoding that uses two or four bytes per
83
character. It is the common encoding for user-exposed text in Qt.
84
\endlist
85
See the \l{Unicode in Qt}{information about support for Unicode in Qt} for
86
more information.
87
88
Other encodings are supported in the form of single functions like
89
\l{QString::fromUcs4()} or of the \l{QStringConverter} classes. Furthermore,
90
Qt provides an encoding-agnostic container for data, \l QByteArray, that is
91
well-suited to storing binary data. \l QAnyStringView keeps track of the
92
encoding of the underlying string and can thus carry a view onto strings
93
with any of the supported encoding standards.
94
95
Converting between encodings is expensive, therefore, avoid if possible. On
96
the other hand, a more compact encoding, particularly for string literals,
97
can reduce binary size, which can increase performance. Where string
98
literals can be expressed in Latin-1, it manages a good compromise between
99
these competing factors, even if it has to be converted to UTF-16 at some
100
point. When a Latin-1 string must be converted to a \l QString, it is done
101
relatively efficiently.
102
103
\section3 Functionality
104
105
String classes can be further distinguished by the functionality they
106
support. One major distinction is whether they own, and thus control, their
107
data or merely reference data held elsewhere. The former are called \e
108
owning containers, the latter \e non-owning containers or views. A
109
non-owning container type typically just records a pointer to the start of
110
the data and its size, making it lightweight and cheap, but it only remains
111
valid as long as the data remains available. An owning string manages the
112
memory in which it stores its data, ensuring that data remains available
113
throughout the lifetime of the container, but its creation and destruction
114
incur the costs of allocating and releasing memory. Views typically support
115
a subset of the functions of the owning string, lacking the possibility to
116
modify the underlying data.
117
118
As a result, string views are particularly well-suited to representing
119
parts of larger strings, for example in a parser, while owning strings are
120
good for persistent storage, such as members of a class. Where a function
121
returns a string that it has constructed, for example by combining
122
fragments, it has to return an owning string; but where a function returns
123
part of some persistently stored string, a view is usually more suitable.
124
125
Note that owning containers in Qt share their data \l{Implicit
126
Sharing}{implicitly}, meaning that it is also efficient to pass or return
127
large containers by value, although slightly less efficient than passing by
128
reference due to the reference counting. If you want to make use of the
129
implicit data sharing mechanism of Qt classes, you have to pass the string
130
as an owning container or a reference to one. Conversion to a view and back
131
will always create an additional copy of the data.
132
133
Finally, Qt provides classes for single characters, lists of strings and
134
string matchers. These classes are available for most supported encoding
135
standards in Qt, with some exceptions. Higher level functionality is
136
provided by specialized classes, such as \l QLocale or \l
137
QTextBoundaryFinder. These high level classes usually rely on \l QString
138
and its UTF-16 encoding. Some classes are templates and work with all
139
available string classes.
140
141
\section3 Literals
142
143
The C++ standard provides
144
\l{https://en.cppreference.com/w/cpp/language/string_literal} {string
145
literals} to create strings at compile-time. There are string literals
146
defined by the language and literals defined by Qt, so-called
147
\l{https://en.cppreference.com/w/cpp/language/user_literal}{user-defined
148
literals}. A string literal defined by C++ is enclosed in double quotes and
149
can have a prefix that tells the compiler how to interpret its content. For
150
Qt, the UTF-16 string literal \c{u"foo"} is the most important. It creates
151
a string encoded in UTF-16 at compile-time, saving the need to convert from
152
some other encoding at run-time. \l QStringView can be easily and
153
efficiently constructed from one, so they can be passed to functions that
154
accept a \l QStringView argument (or, as a result, a \l QAnyStringView).
155
156
User-defined literals have the same form as those defined by C++ but add a
157
suffix after the closing quote. The encoding remains determined by the
158
prefix, but the resulting literal is used to construct an object of some
159
user-defined type. Qt thus defines these for some of its own string types:
160
\c{u"foo"_s} for \l QString, \c{"foo"_L1} for \l QLatin1StringView and
161
\c{u"foo"_ba} for \l QByteArray. These are provided by using the
162
\l{Qt::Literals::StringLiterals}{StringLiterals Namespace}. A plain C++
163
string literal \c{"foo"} will be
164
understood as UTF-8 and conversion to QString and thus UTF-16 will be
165
expensive. When you have string literals in plain ASCII, use \c{"foo"_L1}
166
to interpret it as Latin-1, gaining the various benefits outlined above.
167
168
\section1 Basic string classes
169
170
The following table gives an overview over basic string classes for the
171
various standards of text encoding.
172
173
\table
174
\header
175
\li Encoding
176
\li C++ String literal
177
\li Qt user-defined literal
178
\li C++ Character
179
\li Qt Character
180
\li Owning string
181
\li Non-owning string
182
\row
183
\li Latin-1
184
\li -
185
\li ""_L1
186
\li -
187
\li \l QLatin1Char
188
\li -
189
\li \l QLatin1StringView
190
\row
191
\li UTF-8
192
\li u8""
193
\li -
194
\li char8_t
195
\li -
196
\li -
197
\li \l QUtf8StringView
198
\row
199
\li UTF-16
200
\li u""
201
\li u""_s
202
\li char16_t
203
\li \l QChar
204
\li \l QString
205
\li \l QStringView
206
\row
207
\li Binary/None
208
\li -
209
\li ""_ba
210
\li std::byte
211
\li -
212
\li \l QByteArray
213
\li \l QByteArrayView
214
\row
215
\li Flexible
216
\li any
217
\li -
218
\li -
219
\li -
220
\li -
221
\li \l QAnyStringView
222
\endtable
223
224
Some of the missing entries can be substituted with built-in and standard
225
library C++ types: An owning Latin-1 or UTF-8 encoded string can be
226
\c{std::string} or any 8-bit \c char array. \l QStringView can also reference
227
any 16-bit character arrays, such as std::u16string or std::wstring on some
228
platforms.
229
230
Qt also provides specialized lists for some of those types, that are \l
231
QStringList and \l QByteArrayView, as well as matchers, \l
232
QLatin1StringMatcher and \l QByteArrayMatcher. The matchers also have
233
static versions that are created at compile-time, \l
234
QStaticLatin1StringMatcher and \l QStaticByteArrayMatcher.
235
236
Further worth noting:
237
238
\list
239
240
\li \l QStringLiteral is a macro which is identical to \c{u"foo"_s} and
241
available without the \l{Qt::Literals::StringLiterals}{StringLiterals
242
Namespace}. Preferably you should use the modern string literal.
243
244
\li \l QLatin1String is a synonym for \l QLatin1StringView and exists for
245
backwards compatibility. It is not an owning string and might be removed in
246
future releases.
247
248
\li \l QAnyStringView provides a view for a string with any of the three
249
supported encodings. The encoding is stored alongside the reference to the
250
data. This class is well suited to create interfaces that take a wide
251
spectrum of string types and encodings. In contrast to other classes, no
252
processing is conducted on \l QAnyStringView directly. Processing is
253
conducted on the underlying \l QLatin1StringView, \l QUtf8StringView or
254
\l QStringView in the respective encoding. Use \l QAnyStringView::visit()
255
to do the same in your own functions that take this class as an argument.
256
257
\li A \l QLatin1StringView with non-ASCII characters is not straightforward
258
to construct in a UTF-8 encoded source code file and requires special
259
treatment, see the \l QLatin1StringView documentation.
260
261
\li \l QStringRef is a reference to a portion of a \l QString, available in
262
the Qt5Compat module for backwards compatibility. It should be replaced by
263
\l QStringView.
264
265
\endlist
266
267
\section1 High-level string-related classes
268
269
More high-level classes that provide additional functionality work
270
mostly with \l QString and thus UTF-16. These are:
271
272
\list
273
\li \l QRegularExpression, \l QRegularExpressionMatch and
274
\l QRegularExpressionMatchIterator to work with pattern matching
275
and regular expressions.
276
\li \l QLocale to convert numbers and data to and from strings in a
277
manner appropriate to the user's language and culture.
278
\li \l QCollator and \l QCollatorSortKey to compare strings with
279
respect to the users language, script or territory.
280
\li \l QTextBoundaryFinder to break up text ready for typesetting
281
in accord with Unicode rules.
282
\li \c{QStringBuilder}, an internal class that will substantially
283
improve the performance of string concatenations with the \c{+}
284
operator, see the \l QString documentation.
285
\endlist
286
287
Some classes are templates or have a flexible API and work with various
288
string classes. These are
289
290
\list
291
\li \l QTextStream to stream into \l QIODevice, \l QByteArray or
292
\l QString
293
\li \l QStringTokenizer to split strings
294
\endlist
295
296
\section1 Which string class to use?
297
298
The general guidance in using string classes is:
299
\list
300
\li Avoid copying and memory allocations,
301
\li Avoid encoding conversions, and
302
\li Choose the most compact encoding.
303
\endlist
304
305
Qt provides many functionalities to avoid memory allocations. Most Qt
306
containers employ \l{Implicit Sharing} of their data. For implicit sharing
307
to work, there must be an uninterrupted chain of the same class —
308
converting from \l QString to \l QStringView and back will result in two \l
309
{QString}{QStrings} that do not share their data. Therefore, functions need
310
to pass their data as \l QString (both values or references work).
311
Extracting parts of a string is not possible with implicit data sharing. To
312
use parts of a longer string, make use of string views, an explicit form of
313
data sharing.
314
315
Conversions between encodings can be reduced by sticking to a certain
316
encoding. Data received, for example in UTF-8, is best stored and processed
317
in UTF-8 if no conversation to any other encoding is required. Comparisons
318
between strings of the same encoding are fastest and the same is the case
319
for most other operations. If strings of a certain encoding are often
320
compared or converted to any other encoding it might be beneficial to
321
convert and store them once. Some operations provide many overloads (or a
322
\l QAnyStringView overload) to take various string types and encodings and
323
they should be the second choice to optimize performance, if using the same
324
encoding is not feasible. Explicit encoding conversions before calling a
325
function should be a last resort when no other option is available. Latin-1
326
is a very simple encoding and operation between Latin-1 and any other
327
encoding are almost as efficient as operations between the same encoding.
328
329
The most efficient encoding (from most to least efficient Latin-1, UTF-8,
330
UTF-16) should be chosen when no other constrains determine the encoding.
331
For error handling and logging \l QLatin1StringView is usually sufficient.
332
User-visible strings in Qt are always of type \l {QString} and as such
333
UTF-16 encoded. Therefore it is most effective to use \l
334
{QString}{QStrings}, \l {QStringView}{QStringViews} and \l
335
{QStringLiteral}{QStringLiterals} throughout the life-time of a
336
user-visible string. The \l QObject::tr() function provides the correct
337
encoding and type. \l QByteArray should be used if encoding does not play a
338
role, for example to store binary data, or if the encoding is unknown.
339
340
\section2 String class for creating API
341
342
\image string_class_api.svg "String class for an optimal API"
343
344
\section3 Member variables
345
346
Member variables should be of an owning type in nearly all cases. Views can only
347
be used as member variables if the lifetime of the referenced owning string
348
is guaranteed to exceed the lifetime of the object.
349
350
\section3 Function arguments
351
352
Function arguments should be string views of a suitable encoding in most
353
cases. \l QAnyStringView can be used as a parameter to support more than
354
one encoding and \l QAnyStringView::visit() can be used internally to fork
355
off into per-encoding functions. If the function is limited to a single
356
encoding, \l QLatin1StringView, \l QUtf8StringView, \l QStringView or \l
357
QByteArrayView should be used.
358
359
If the function saves the argument in an owning string (usually a
360
setter function), it is most efficient to use the same owning string as
361
function argument to make use of the implicit data sharing functionality of
362
Qt. The owning string can be passed as a \c const reference. Overloading
363
functions with multiple owning and non-owning string types can lead to
364
overload ambiguity and should be avoided. Owning string types in Qt can be
365
automatically converted to their non-owning version or to \l
366
QAnyStringView.
367
368
\section3 Return values
369
370
Temporary strings have to be returned as an owning string, usually
371
\l QString. If the returned string is known at compile-time use
372
\c{u"foo"_s} to construct the \l QString structure at compile-time. If
373
existing owning strings (for example \l QString) are returned from a
374
function in full (for example a getter function), it is most efficient to
375
return them by reference. They can also be returned by value to allow
376
returning a temporary in the future. Qt's use of implicit sharing avoids
377
the performance impact of allocation and copying when returning by value.
378
379
Parts of existing strings can be returned efficiently with a string view
380
of the appropriate encoding, for an example see \l
381
QRegularExpressionMatch::capturedView() which returns a \l QStringView.
382
383
\section2 String class for using API
384
385
\image string_class_calling.svg "String class for calling a function"
386
387
To use a Qt API efficiently you should try to match the function argument
388
types. If you are limited in your choice, Qt will conduct various
389
conversions: Owning strings are implicitly converted to non-owning
390
strings, non-owning strings can create their owning counter parts,
391
see for example \l QStringView::toString(). Encoding conversions are
392
conducted implicitly in many cases but this should be avoided if possible.
393
To avoid accidental implicit conversion from UTF-8 you can activate the
394
macro \l QT_NO_CAST_FROM_ASCII.
395
396
If you need to assemble a string at runtime before passing it to a function
397
you will need an owning string and thus \l QString. If the function
398
argument is \l QStringView or \l QAnyStringView it will be implicitly
399
converted.
400
401
If the string is known at compile-time, there is room for optimization. If
402
the function accepts a \l QString, you should create it with \c{u"foo"_s}
403
or the \l QStringLiteral macro. If the function expects a \l QStringView,
404
it is best constructed with an ordinary UTF-16 string literal \c{u"foo"},
405
if a \l QLatin1StringView is expected, construct it with \c{"foo"_L1}. If
406
you have the choice between both, for example if the function expects \l
407
QAnyStringView, use the tightest encoding, usually Latin-1.
408
409
\section1 List of all string related classes
410
*/
411
412
// The list is autogenerated by qdoc because this is a group page
qtbase
src
corelib
doc
src
qstring-overview.qdoc
Generated on
for Qt by
1.14.0