Skip to content

This function uses grapheme clusters instead of Unicode code points in UTF-8 strings.

Usage

utf8_substr(x, start, stop)

Arguments

x

Character vector.

start

Starting index or indices, recycled to match the length of x.

stop

Ending index or indices, recycled to match the length of x.

Value

Character vector of the same length as x, containing the requested substrings.

See also

Other UTF-8 string manipulation: utf8_graphemes(), utf8_nchar()

Examples

# Five grapheme clusters, select the middle three
str <- paste0(
  "\U0001f477\U0001f3ff\u200d\u2640\ufe0f",
  "\U0001f477\U0001f3ff",
  "\U0001f477\u200d\u2640\ufe0f",
  "\U0001f477\U0001f3fb",
  "\U0001f477\U0001f3ff")
cat(str)
#> πŸ‘·πŸΏβ€β™€οΈπŸ‘·πŸΏπŸ‘·β€β™€οΈπŸ‘·πŸ»πŸ‘·πŸΏ
str24 <- utf8_substr(str, 2, 4)
cat(str24)
#> πŸ‘·πŸΏπŸ‘·β€β™€οΈπŸ‘·πŸ»