Chapters

Hide chapters

Swift Apprentice: Fundamentals

First Edition · iOS 16 · Swift 5.7 · Xcode 14.2

Section III: Building Your Own Types

Section 3: 9 chapters
Show chapters Hide chapters

9. Strings
Written by Matt Galloway

Heads up... You're reading this book for free, with parts of this chapter shown beyond this point as scrambled text.

So far, you have briefly seen what the type String has to offer for representing text. Text is a ubiquitous data type: people’s names, addresses and the words of a book. These are examples of text that an app might need to handle. It’s worth having a deeper understanding of how String works and what it can do.

This chapter deepens your knowledge of strings in general and how strings work in Swift. Swift is one of the few languages that handle Unicode characters correctly while maintaining maximum predictable performance.

Strings as Collections

In Chapter 2, “Types & Operations”, you learned what a string is, and what character sets and code points are. To recap, they define the mapping numbers to the character it represents. And now, it’s time to look deeper into the String type.

It’s pretty easy to conceptualize a string as a collection of characters. Because strings are collections, you can do things like this:

let string = "Matt"
for char in string {
  print(char)
}

This code will print out every character of Matt individually. Simple, eh?

You can also use other collection operations, such as:

let stringLength = string.count

This assignment will give you the length of the string.

Now imagine you want to get the fourth character in the string. You may think of doing something like this:

let fourthChar = string[3]

However, if you did this, you would receive the following error message:

'subscript' is unavailable: cannot subscript String with an Int, see the documentation comment for discussion

Why is that? The short answer is that characters do not have a fixed size, so you can’t access them like an array. Why not? It’s time to take a detour further into how strings work by introducing what a grapheme cluster is.

Grapheme Clusters

As you know, a string is made up of a collection of Unicode characters. Until now, you have considered one code point to precisely equal one character and vice versa. However, the term “character” is relatively loose.

é a ́ 274 031 669

👍 🏽 702821 569881 👍🏽

let cafeNormal = "café"
let cafeCombining = "cafe\u{0301}"

cafeNormal.count     // 4
cafeCombining.count  // 4
cafeNormal.unicodeScalars.count     // 4
cafeCombining.unicodeScalars.count  // 5
for codePoint in cafeCombining.unicodeScalars {
  print(codePoint.value)
}
99
97
102
101
769

Indexing Strings

Swift doesn’t allow you to get a specific character (err, I mean grapheme cluster) using an integer subscript. While it’s certainly possible to write a function to do this, there are good reasons for the standard library not providing it. The first reason is correctness – Characters are variable in size and cannot be accessed using constant offsets. Swift also wants to prevent you from inadvertently writing inefficient, battery-draining string-processing code. You might not see problems with small strings, but performance would be unacceptable with larger strings.

let firstIndex = cafeCombining.startIndex
let firstChar = cafeCombining[firstIndex]
let lastIndex = cafeCombining.endIndex
let lastChar = cafeCombining[lastIndex]
Fatal error: String index is out of bounds
let lastIndex = cafeCombining.index(before: cafeCombining.endIndex)
let lastChar = cafeCombining[lastIndex]
let fourthIndex = cafeCombining.index(cafeCombining.startIndex,
                                      offsetBy: 3)
let fourthChar = cafeCombining[fourthIndex]
fourthChar.unicodeScalars.count // 2
fourthChar.unicodeScalars.forEach { codePoint in
  print(codePoint.value)
}
101
769

Equality With Combining Characters

Combining characters make the equality of strings a little trickier. For example, consider the word café written once using the single é character, and once using the combining character, like so:

c a p é 57 83 396 357 x u j e 52 10 418 023 ́ 689

let equal = cafeNormal == cafeCombining

Strings as Bi-directional Collections

Sometimes you want to reverse a string. Often this is so you can iterate through it backward. Fortunately, Swift has a rather simple way to do this, through a method called reversed() like so:

let name = "Matt"
let backwardsName = name.reversed()
let secondCharIndex = backwardsName.index(backwardsName.startIndex,
                                          offsetBy: 1)
let secondChar = backwardsName[secondCharIndex] // "t"
let backwardsNameString = String(backwardsName)

Raw Strings

A raw string is useful when you want to avoid special characters or string interpolation. Instead, the complete string as you type it is what becomes the string. To illustrate this, consider the following raw string:

let raw1 = #"Raw "No Escaping" \(no interpolation!). Use all the \ you want!"#
print(raw1)
Raw "No Escaping" \(no interpolation!). Use all the \ you want!
let raw2 = ##"Aren’t we "# clever"##
print(raw2)
Aren’t we "# clever
let can = "can do that too"
let raw3 = #"Yes we \#(can)!"#
print(raw3)
Yes, we can do that too!
let multiRaw = #"""
  _____         _  __ _
 / ____|       (_)/ _| |
| (_____      ___| |_| |_
 \___ \ \ /\ / / |  _| __|
 ____) \ V  V /| | | | |_
|_____/ \_/\_/ |_|_|  \__|
"""#
print(multiRaw)

Substrings

Another thing you often need to do when manipulating strings is to generate substrings. That is, pull out a part of the string into its own value. Swift can do this using a subscript that takes a range of indices.

let fullName = "Matt Galloway"
let spaceIndex = fullName.firstIndex(of: " ")!
let firstName = fullName[fullName.startIndex..<spaceIndex] // "Matt"
let firstName = fullName[..<spaceIndex] // "Matt"
let lastName = fullName[fullName.index(after: spaceIndex)...]
// "Galloway"
let lastNameString = String(lastName)

Character Properties

You encountered the Character type earlier in this chapter. Some rather interesting properties of this type allow you to introspect the character in question and learn about its semantics.

let singleCharacter: Character = "x"
singleCharacter.isASCII
let space: Character = " "
space.isWhitespace
let hexDigit: Character = "d"
hexDigit.isHexDigit
let thaiNine: Character = "๙"
thaiNine.wholeNumberValue

Encoding

So far, you’ve learned what strings are and explored how to work with them but haven’t touched on how strings are stored or encoded.

UTF-8

A much more common scheme is called UTF-8. This encoding uses 8-bit code units instead. One reason for UTF-8’s popularity is that it is fully compatible with the venerable, English-only, 7-bit ASCII encoding. But how do you store code points that need more than eight bits?! Herein lies the magic of the encoding.

0 4 7 7 2 3 0 8 6 4 7 4 7 8 0 0

0 N G H Z G L Y 7 9 1 T R Z W W 4 0 5 0 C Q C P 9 8 7 4 6 C B W 6 4 N F H G Z T 4 0 X B S Y J H 1 2 M X T K L Q 5 0 Q Y C R Z T 5 7 G G T T N H 4 4 T X M Z H M

let char = "\u{00bd}"
for i in char.utf8 {
  print(i)
}
194
189
+½⇨🙃 
let characters = "+\u{00bd}\u{21e8}\u{1f643}"
for i in characters.utf8 {
  print("\(i) : \(String(i, radix: 2))")
}
43 : 101011

194 : 11000010
189 : 10111101

226 : 11100010
135 : 10000111
168 : 10101000

240 : 11110000
159 : 10011111
153 : 10011001
131 : 10000011

UTF-16

There is another encoding that is useful to introduce, namely UTF-16. Yes, you guessed it. It uses 16-bit code units!

for i in characters.utf16 {
  print("\(i) : \(String(i, radix: 2))")
}
43 : 101011

189 : 10111101

8680 : 10000111101000

55357 : 1101100000111101
56899 : 1101111001000011

Converting Indexes Between Encoding Views

As you saw earlier, you use indexes to access grapheme clusters in a string. For example, using the same string from above, you can do the following:

let arrowIndex = characters.firstIndex(of: "\u{21e8}")!
characters[arrowIndex] // ⇨
if let unicodeScalarsIndex = arrowIndex.samePosition(in: characters.unicodeScalars) {
  characters.unicodeScalars[unicodeScalarsIndex] // 8680
}

if let utf8Index = arrowIndex.samePosition(in: characters.utf8) {
  characters.utf8[utf8Index] // 226  
}

if let utf16Index = arrowIndex.samePosition(in: characters.utf16) {
  characters.utf16[utf16Index] // 8680
}

Challenges

Before moving on, here are some challenges to test your knowledge of collection iterations with closures. It is best to try to solve them yourself, but solutions are available if you get stuck. Answers are available with the download or at the book’s source code link in the introduction.

Challenge 1: Character Count

Write a function that takes a string and prints out the count of each character in the string. For bonus points, print them ordered by the count of each character. For bonus-bonus points, print it as a nice histogram.

Challenge 2: Word Count

Write a function that tells you how many words there are in a string. Do it without splitting the string.

Challenge 3: Name Formatter

Write a function that takes a string that looks like “Galloway, Matt” and returns one which looks like “Matt Galloway”, i.e., the string goes from "<LAST_NAME>, <FIRST_NAME>" to "<FIRST_NAME> <LAST_NAME>".

Challenge 4: Components

A method exists on a string named components(separatedBy:) that will split the string into chunks, which are delimited by the given string, and return an array containing the results.

Challenge 5: Word Reverser

Write a function that takes a string and returns a version of it with each individual word reversed.

Key Points

  • Strings are collections of Character types.
  • A Character is grapheme cluster and is made up of one or more code points.
  • A combining character is a character that alters the previous character in some way.
  • You use special (non-integer) indexes to subscript into the string to a certain grapheme cluster.
  • Swift’s use of canonicalization ensures that the comparison of strings accounts for combining characters.
  • Slicing a string yields a substring with the type Substring, which shares storage with its parent String.
  • You can convert from a Substring to a String by initializing a new String and passing the Substring.
  • Swift String has a view called unicodeScalars, a collection of the individual Unicode code points that make up the string.
  • There are multiple ways to encode a string. UTF-8 and UTF-16 are the most popular.
  • The individual parts of an encoding are called code units. UTF-8 uses 8-bit code units, and UTF-16 uses 16-bit code units.
  • Swift’s String has views called utf8 and utf16that are collections that allow you to obtain the individual code units in the given encoding.
Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.
© 2024 Kodeco Inc.

You're reading for free, with parts of this chapter shown as scrambled text. Unlock this book, and our entire catalogue of books and videos, with a Kodeco Personal Plan.

Unlock now