Qt
Internal/Contributor docs for the Qt SDK. Note: These are NOT official API docs; those are found at https://doc.qt.io/
Loading...
Searching...
No Matches
QStringConverter Class Reference

\inmodule QtCore More...

#include <qstringconverter_base.h>

Inheritance diagram for QStringConverter:
Collaboration diagram for QStringConverter:

Classes

class  FinalizeResultBase
class  FinalizeResultChar
 \inmodule QtCore More...
struct  Interface
struct  State

Public Types

enum class  Flag {
  Default = 0 , Stateless = 0x1 , ConvertInvalidToNull = 0x2 , WriteBom = 0x4 ,
  ConvertInitialBom = 0x8 , UsesIcu = 0x10
}
 \value Default Default conversion rules apply. More...
enum  Encoding {
  Utf8 , Utf16 , Utf16LE , Utf16BE ,
  Utf32 , Utf32LE , Utf32BE , Latin1 ,
  System , LastEncoding = System
}
 \value Utf8 Create a converter to or from UTF-8 \value Utf16 Create a converter to or from UTF-16. More...

Public Member Functions

 QStringConverter (QStringConverter &&)=default
QStringConverteroperator= (QStringConverter &&)=default
bool isValid () const noexcept
 Returns true if this is a valid string converter that can be used for encoding or decoding text.
void resetState () noexcept
 Resets the internal state of the converter, clearing potential errors or partial conversions.
bool hasError () const noexcept
 Returns true if a conversion could not correctly convert a character.
Q_CORE_EXPORT const char * name () const noexcept
 Returns the canonical name of the encoding this QStringConverter can encode or decode.

Static Public Member Functions

static Q_CORE_EXPORT std::optional< EncodingencodingForName (QAnyStringView name) noexcept
 Convert name to the corresponding \l Encoding member, if there is one.
Q_DECL_PURE_FUNCTION static Q_CORE_EXPORT const char * nameForEncoding (Encoding e) noexcept
 Returns the canonical name for encoding e or \nullptr if e is an invalid value.
static Q_CORE_EXPORT std::optional< EncodingencodingForData (QByteArrayView data, char16_t expectedFirstCharacter=0) noexcept
 Returns the encoding for the content of data if it can be determined.
static Q_CORE_EXPORT std::optional< EncodingencodingForHtml (QByteArrayView data)
 Tries to determine the encoding of the HTML in data by looking at leading byte order marks or a charset specifier in the HTML meta tag.
static Q_CORE_EXPORT QStringList availableCodecs ()
 Returns a list of names of supported codecs.

Protected Member Functions

constexpr QStringConverter () noexcept
constexpr QStringConverter (Encoding encoding, Flags f)
constexpr QStringConverter (const Interface *i) noexcept
Q_CORE_EXPORT QStringConverter (QAnyStringView name, Flags f)
 ~QStringConverter ()=default

Protected Attributes

const Interfaceiface
State state

Detailed Description

\inmodule QtCore

The QStringConverter class provides a base class for encoding and decoding text. \reentrant

Qt uses UTF-16 to store, draw and manipulate strings. In many situations you may wish to deal with data that uses a different encoding. Most text data transferred over files and network connections is encoded in UTF-8.

The QStringConverter class is a base class for the \l {QStringEncoder} and \l {QStringDecoder} classes that help with converting between different text encodings. QStringDecoder can decode a string from an encoded representation into UTF-16, the format Qt uses internally. QStringEncoder does the opposite operation, encoding UTF-16 encoded data (usually in the form of a QString) to the requested encoding.

The following encodings are always supported:

\list

  • UTF-8
  • UTF-16
  • UTF-16BE
  • UTF-16LE
  • UTF-32
  • UTF-32BE
  • UTF-32LE
  • ISO-8859-1 (Latin-1)
  • The system encoding \endlist

QStringConverter may support more encodings depending on how Qt was compiled. If more codecs are supported, they can be listed using availableCodecs().

\l {QStringConverter}s can be used as follows to convert some encoded string to and from UTF-16.

Suppose you have some string encoded in UTF-8, and want to convert it to a QString. The simple way to do it is to use a \l {QStringDecoder} like this:

QByteArray encodedString = "...";
QString string = toUtf16(encodedString);

After this, string holds the text in decoded form. Converting a string from Unicode to the local encoding is just as easy using the \l {QStringEncoder} class:

QString string = "...";
QByteArray encodedString = fromUtf16(string);

To read or write text files in various encodings, use QTextStream and its \l{QTextStream::setEncoding()}{setEncoding()} function.

Some care must be taken when trying to convert the data in chunks, for example, when receiving it over a network. In such cases it is possible that a multi-byte character will be split over two chunks. At best this might result in the loss of a character and at worst cause the entire conversion to fail.

Both QStringEncoder and QStringDecoder make this easy, by tracking this in an internal state. So simply calling the encoder or decoder again with the next chunk of data will automatically continue encoding or decoding the data correctly:

while (new_data_available() && !toUtf16.hasError()) {
string += toUtf16(chunk);
}
auto result = toUtf16.finalize();
// Handle error
}

The QStringDecoder object maintains state between chunks and therefore works correctly even if a multi-byte character is split between chunks.

QStringConverter objects can't be copied because of their internal state, but can be moved.

See also
QTextStream, QStringDecoder, QStringEncoder

Definition at line 33 of file qstringconverter_base.h.

Member Enumeration Documentation

◆ Encoding

\value Utf8 Create a converter to or from UTF-8 \value Utf16 Create a converter to or from UTF-16.

When decoding, the byte order will get automatically detected by a leading byte order mark. If none exists or when encoding, the system byte order will be assumed. \value Utf16BE Create a converter to or from big-endian UTF-16. \value Utf16LE Create a converter to or from little-endian UTF-16. \value Utf32 Create a converter to or from UTF-32. When decoding, the byte order will get automatically detected by a leading byte order mark. If none exists or when encoding, the system byte order will be assumed. \value Utf32BE Create a converter to or from big-endian UTF-32. \value Utf32LE Create a converter to or from little-endian UTF-32. \value Latin1 Create a converter to or from ISO-8859-1 (Latin1). \value System Create a converter to or from the underlying encoding of the operating systems locale. This is always assumed to be UTF-8 for Unix based systems. On Windows, this converts to and from the locale code page. \omitvalue LastEncoding

Enumerator
Utf8 
Utf16 
Utf16LE 
Utf16BE 
Utf32 
Utf32LE 
Utf32BE 
Latin1 
System 
LastEncoding 

Definition at line 102 of file qstringconverter_base.h.

◆ Flag

enum class QStringConverter::Flag
strong

\value Default Default conversion rules apply.

\value ConvertInvalidToNull If this flag is set, each invalid input character is output as a null character. If it is not set, invalid input characters are represented as QChar::ReplacementCharacter if the output encoding can represent that character, otherwise as a question mark. \value WriteBom When converting from a QString to an output encoding, write a QChar::ByteOrderMark as the first character if the output encoding supports this. This is the case for UTF-8, UTF-16 and UTF-32 encodings. \value ConvertInitialBom When converting from an input encoding to a QString the QStringDecoder usually skips an leading QChar::ByteOrderMark. When this flag is set, the byte order mark will not be skipped, but converted to utf-16 and inserted at the start of the created QString. \value Stateless Ignore possible converter states between different function calls to encode or decode strings. This will also cause the QStringConverter to raise an error if an incomplete sequence of data is encountered. \omitvalue UsesIcu

Enumerator
Default 
Stateless 
ConvertInvalidToNull 
WriteBom 
ConvertInitialBom 
UsesIcu 

Definition at line 37 of file qstringconverter_base.h.

Constructor & Destructor Documentation

◆ QStringConverter() [1/5]

QStringConverter::QStringConverter ( )
inlineconstexprprotectednoexcept

Definition at line 131 of file qstringconverter_base.h.

References iface.

◆ QStringConverter() [2/5]

QStringConverter::QStringConverter ( Encoding encoding,
Flags f )
inlineexplicitconstexprprotected

Definition at line 134 of file qstringconverter_base.h.

References QStringConverter().

Referenced by QStringConverter().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ QStringConverter() [3/5]

QStringConverter::QStringConverter ( const Interface * i)
inlineexplicitconstexprprotectednoexcept

Definition at line 137 of file qstringconverter_base.h.

References iface.

◆ QStringConverter() [4/5]

QStringConverter::QStringConverter ( QAnyStringView name,
Flags f )
explicitprotected

Definition at line 2407 of file qstringconverter.cpp.

◆ ~QStringConverter()

QStringConverter::~QStringConverter ( )
protecteddefault

◆ QStringConverter() [5/5]

QStringConverter::QStringConverter ( QStringConverter && )
default

Member Function Documentation

◆ availableCodecs()

QStringList QStringConverter::availableCodecs ( )
static

Returns a list of names of supported codecs.

The names returned by this function can be passed to QStringEncoder's and QStringDecoder's constructor to create a en- or decoder for the given codec.

This function may be used to obtain a listing of additional codecs beyond the standard ones. Support for additional codecs requires Qt be compiled with support for the ICU library.

Note
The order of codecs is an internal implementation detail and not guaranteed to be stable.

Definition at line 2650 of file qstringconverter.cpp.

◆ encodingForData()

std::optional< QStringConverter::Encoding > QStringConverter::encodingForData ( QByteArrayView data,
char16_t expectedFirstCharacter = 0 )
staticnoexcept

Returns the encoding for the content of data if it can be determined.

expectedFirstCharacter can be passed as an additional hint to help determine the encoding.

The returned optional is empty, if the encoding is unclear.

Definition at line 2524 of file qstringconverter.cpp.

◆ encodingForHtml()

std::optional< QStringConverter::Encoding > QStringConverter::encodingForHtml ( QByteArrayView data)
static

Tries to determine the encoding of the HTML in data by looking at leading byte order marks or a charset specifier in the HTML meta tag.

If the optional is empty, the encoding specified is not supported by QStringConverter. If no encoding is detected, the method returns Utf8.

See also
QStringDecoder::decoderForHtml()

Definition at line 2609 of file qstringconverter.cpp.

◆ encodingForName()

std::optional< QStringConverter::Encoding > QStringConverter::encodingForName ( QAnyStringView name)
staticnoexcept

Convert name to the corresponding \l Encoding member, if there is one.

If the name is not the name of a codec listed in the Encoding enumeration, {std::nullopt} is returned. Such a name may, none the less, be accepted by the QStringConverter constructor when Qt is built with ICU, if ICU provides a converter with the given name.

Note
In Qt versions prior to 6.8, this function took only a {const char *}, which was expected to be UTF-8-encoded.

Definition at line 2481 of file qstringconverter.cpp.

◆ hasError()

bool QStringConverter::hasError ( ) const
inlinenoexcept

Returns true if a conversion could not correctly convert a character.

This could for example get triggered by an invalid UTF-8 sequence or when a character can't get converted due to limitations in the target encoding.

Definition at line 158 of file qstringconverter_base.h.

◆ isValid()

bool QStringConverter::isValid ( ) const
inlinenoexcept

Returns true if this is a valid string converter that can be used for encoding or decoding text.

Default constructed string converters or converters constructed with an unsupported name are not valid.

Definition at line 152 of file qstringconverter_base.h.

References iface.

◆ name()

const char * QStringConverter::name ( ) const
noexcept

Returns the canonical name of the encoding this QStringConverter can encode or decode.

Returns a nullptr if the converter is not valid. The returned name is UTF-8 encoded.

See also
isValid()

Definition at line 2420 of file qstringconverter.cpp.

◆ nameForEncoding()

const char * QStringConverter::nameForEncoding ( QStringConverter::Encoding e)
staticnoexcept

Returns the canonical name for encoding e or \nullptr if e is an invalid value.

Note
In Qt versions prior to 6.10, 6.9.1, 6.8.4 or 6.5.9, calling this function with an invalid argument resulted in undefined behavior. Since the above-mentioned Qt versions, it returns nullptr instead.

Definition at line 2915 of file qstringconverter.cpp.

◆ operator=()

QStringConverter & QStringConverter::operator= ( QStringConverter && )
default

◆ resetState()

void QStringConverter::resetState ( )
inlinenoexcept

Resets the internal state of the converter, clearing potential errors or partial conversions.

Definition at line 154 of file qstringconverter_base.h.

Member Data Documentation

◆ iface

const Interface* QStringConverter::iface
protected

Definition at line 193 of file qstringconverter_base.h.

Referenced by QStringConverter(), QStringConverter(), and isValid().

◆ state

State QStringConverter::state
protected

Definition at line 194 of file qstringconverter_base.h.


The documentation for this class was generated from the following files: