libowfat/scan/scan_utf8.3

.TH scan_utf8 3
.SH NAME
scan_utf8 \- decode an unsigned integer from UTF-8 encoding
.SH SYNTAX
.B #include <libowfat/scan.h>

size_t \fBscan_utf8\fP(const char *\fIsrc\fR,size_t \fIlen\fR,uint32_t *\fIdest\fR);

size_t \fBscan_utf8_sem\fP(const char *\fIsrc\fR,size_t \fIlen\fR,uint32_t *\fIdest\fR);
.SH DESCRIPTION
scan_utf8 decodes an unsigned integer in UTF-8 encoding from a memory
area holding binary data.  It writes the decode value in \fIdest\fR and
returns the number of bytes it read from \fIsrc\fR.

scan_utf8 never reads more than \fIlen\fR bytes from \fIsrc\fR.  If the
sequence is longer than that, or the memory area contains an invalid
sequence, scan_utf8 returns 0 and does not touch \fIdest\fR.

The length of the longest valid UTF-8 sequence is 6.

scan_utf8 will reject syntactically invalid encodings, but not
semantically invalid ones. scan_utf8_sem will additionally reject
surrogates.
.SH NOTE
fmt_utf8 and scan_utf8 implement the encoding from UTF-8, but are meant
to be able to store integers, not just Unicode code points.  Values
above 0x10ffff are not valid UTF-8.  If you are using this function to
parse UTF-8, you need to reject them (see RFC 3629).
.SH "SEE ALSO"
fmt_utf8(3), scan_utf8_sem(3)
add functions to encode and decode integers in variable length binary formats 13 years ago			`.TH scan_utf8 3`
			`.SH NAME`
			`scan_utf8 \- decode an unsigned integer from UTF-8 encoding`
			`.SH SYNTAX`
#include <foo.h> -> #include <libowfat/foo.h> 8 years ago			`.B #include <libowfat/scan.h>`
add functions to encode and decode integers in variable length binary formats 13 years ago
			`size_t \fBscan_utf8\fP(const char \fIsrc\fR,size_t \fIlen\fR,uint32_t \fIdest\fR);`
add new line 8 years ago
add scan_utf8_sem add utf8 test suite 8 years ago			`size_t \fBscan_utf8_sem\fP(const char \fIsrc\fR,size_t \fIlen\fR,uint32_t \fIdest\fR);`
add functions to encode and decode integers in variable length binary formats 13 years ago			`.SH DESCRIPTION`
			`scan_utf8 decodes an unsigned integer in UTF-8 encoding from a memory`
			`area holding binary data. It writes the decode value in \fIdest\fR and`
			`returns the number of bytes it read from \fIsrc\fR.`

			`scan_utf8 never reads more than \fIlen\fR bytes from \fIsrc\fR. If the`
			`sequence is longer than that, or the memory area contains an invalid`
			`sequence, scan_utf8 returns 0 and does not touch \fIdest\fR.`

add scan_utf8_sem add utf8 test suite 8 years ago			`The length of the longest valid UTF-8 sequence is 6.`

			`scan_utf8 will reject syntactically invalid encodings, but not`
			`semantically invalid ones. scan_utf8_sem will additionally reject`
			`surrogates.`
add json encoding routines to textcode 11 years ago			`.SH NOTE`
			`fmt_utf8 and scan_utf8 implement the encoding from UTF-8, but are meant`
			`to be able to store integers, not just Unicode code points. Values`
			`above 0x10ffff are not valid UTF-8. If you are using this function to`
			`parse UTF-8, you need to reject them (see RFC 3629).`
add functions to encode and decode integers in variable length binary formats 13 years ago			`.SH "SEE ALSO"`
add scan_utf8_sem add utf8 test suite 8 years ago			`fmt_utf8(3), scan_utf8_sem(3)`