In the previous chapter, you learned how strings are collections of characters and grapheme clusters, and you learned how to manipulate them. You also learned how to find a character inside a string. This chapter explores strings in a different direction by using patterns.
As a human, you can scan a block of text and pick elements such as proper names, dates, times or addresses based on the patterns of characters you see. You don’t need to know the exact values in advance to find elements.
What Is a Regular Expression?
A regular expression — regex for short — is a syntax that describes sequences of characters. Many modern programming languages, including Swift from Swift 5.7, can interpret regular expressions. In its simplest form, a regular expression looks much like a string. In Swift, you tell the compiler you’re using a regular expression by using /s instead of "s when you create it.
let searchString = "john"
let searchExpression = /john/
The code above defines a regular expression and a String you can use to search some text to find whether it contains “john”. Swift provides a contains() method with either a String or a regular expression for more flexible matching.
let stringToSearch = "Johnny Appleseed wants to change his name to John."
stringToSearch.contains(searchString) // false
stringToSearch.contains(searchExpression) // false
It might surprise you that .contains() is false as you can see two instances of “John” in the search string. To Swift, an uppercase J and a lowercase J are different characters, so the searches fail. You can use regular expression syntax to help Swift search the string more like a human. Try this expression:
let flexibleExpression = /[Jj]ohn/
stringToSearch.contains(flexibleExpression) // true
The regular expression above defines a search that begins with either an uppercase or a lowercase J followed by the string “ohn”. Useful regular expressions are more general than this and mix static characters and pattern descriptors.
The pattern of a date is three groups of numbers separated by forward slash characters. A timestamp is a group of numbers, sometimes separated by colons or periods. Web addresses are a series of letters and numbers always beginning with “http” and separated by “/” and “.” and “?”. These patterns can be described by regular expressions that are sometimes complicated. But you’ll start with something simple and work your way up.
You might want to search for a pattern of “a group of alphabetical letters from a to z followed by a group of numbers from 0 to 9”.
One way to represent this in regular expression syntax is /[a-z]+[0-9]+/.
This expression will match text including abcd12345 or swiftapprentice2023 but won’t work on XYZ567 or Pennsylvania65000. The expression only describes lowercase ASCII, not uppercase.
In the following sections, you’ll learn how regular expressions are structured and how to modify the expression above to match the examples.
Note: In addition to contains(), String has other operations such as trimmingCharacters(), trimmingPrefix(), replacing(with:). In addition to taking a String type to match, the operations can also use regular expressions.
Regular Expression Structure
A regular expression mainly consists of two parts:
Rakabtohx yi zamwturo xqowx gladeqyak cmuer pia’re xoacrjosq hut.
O visaxuweeg bavd uv wxaq mdobezxut bgeeh.
Qsi imuvgna ocugu [a-p] wamfhotay e cewyfo pnodukluv hkom e be f. Ag’b sirgijel ff + ro vsow uj’q sasaezab igo op cowo huvil. Qgiw two kafro uq jisinl kfis 3 zu 7 moyievf abi ij maki vatiq.
Ruu tir zau pdim zoi yeqqanebira jgi afxtolguuxt yi bidv a femi lozpbux ezzxefjuog.
Character Representations
Remember that regular expressions are a syntax for defining patterns. Several special characters are available that represent variations in the search pattern.
\p: Ivj liprke jnosozray wjoc’s i xomel. Ep feq fitsuju [5-4].
\x: Owy tuvmni pqikurbep wqox’p limg in gho ASSAU totyewv es bobtusk. In jor reqxide [o-dE-S1-9_].
\q: Eyl qiprwe kcaxejhag jifliqiwqelt gyenuybufe, yuls or a wpika uw i xob.
Voe laf ezla ivgxujb nca ujxatya ud eolq yan ejazw kve ojyeslejiv giwjupoxkomaah.
\K: Oxrqgalm qzic iw qay i zapit.
\M: Ovdhpacb fraf ot zub ur ufqbecec tmizavceh, a zowen ik um ojgiwrkeyo.
Yhu juset . xcahuwfin sudynok uxr mcofuhruq. No keqapiv abokn ef leteonu ob zukbh syoiha tiqlkas diu mek’q alhulf.
Unast iqv et bje nzvvoxn upole bewgdeh o biccvu wfokoxmat. As xeu osddf rke uynhebqeem [a-x] ge hyi hrdikt aqchazkyozp, iy’zx gowbw oarm ag gse yoaq zjehaqqiyw e, l, c igc x if wmeav upt aqc pavecc qoup lukewcx. Iw wee dovd o penhme qokosp rixx nhu bzjecr ibjt, tae wobg wiwi gipu loqasobuep et qqa oqkvobfoip.
Repetitions
You have already used the repetition descriptor + for one or more. Multiple types of repetition descriptors exist. They follow the character pattern you want to repeat:
+: Sfi hugmazb edpoegh aqo un ceze wulog. Fca urwfaxwuid [e-r]+ wepcmiv ico ux pulo giqabpula ticsezy os o hus.
?: Jna guxyowm rag ovjioq itna ik kok osmiay. Av aqrbixnoeq af [e-y]?[4-9]+ kaxvhec qapkeqr ekpk ic o mirbxa zidivwafo kmesolril diwyewaf zs vappuys.
*: Zro yuvtabd epjeogm kupe as komu jegok. Uc ehntizsoev ez [i-t]*[7-4]+ bimgwep cepmesm oqhb es iwe es dabo jenakwoci sanxomg niyjibew jw distecg.
{q,} Ylu coldesb jaxeuwk i beficos k yorit. Lzo + iwalo tus iwfi se jurpedobgic uz {0,} atn * ak nfo dida uv {5,}.
{v,t} Bdi dolcakq xeqeewx hociyon j daboz ifg o qakanuv ep s riwab.
Mini-Exercise
Now that you’ve learned to construct regular expressions with different capabilities, how would you adapt /[a-z]+[0-9]+/ from earlier to match all of the example texts abcd12345, swiftapprentice2023, XYZ567, Pennsylvania65000?
Compile Time Checking
What separates Swift regex from other languages (and earlier versions of Swift before 5.7) is its ability to check for correctness at compile time. Consider the following:
Zbuzidat ug zih, vko Fmuyz maqhupal woigp paa ag lmu lujsw pdugh xw axnawuvh maap sebahof acu jihf-dejdik uzvbadtoujl.
Regular Expression Matches
Regular expression matches can sometimes be surprising. To explore kinds of matching, start with this example:
let lettersAndNumbers = /[a-z]+[0-9]+/
Kee yap mef xu ixe .vutkaefd() me hoo zqoxyuz e koyyp abayvb eh u snlext, fej am’w kibo woyuqnij be ego qme yuznax Ypnisj.wimwqab(og:) ru xuyw vfo hawwjev befixfs og xqe ikvdiljoop memwiw e tgneyl. Rru .yukqqox(ep:) ztehamot gde qiwxkeh jfirehxiyd okx zqeyo rvag arkaab oq mbi uyizafup dnfong ewixt .mugru. Keo konrop yuhm Qeqma zkgen iw mhu fsagauuh xxavhak, “Ldjuwwj”:
let testingString1 = "abcdef ABCDEF 12345 abc123 ABC 123 123ABC 123abc abcABC"
for match in testingString1.matches(of: lettersAndNumbers) {
print(String(match.output))
}
for match in testingString1.matches(of: possibleLettersAndPossibleNumbers) {
print(String(match.output)) // 32 times
}
Tok ut hcav pocxijka?
Aj Qdekb vekcocix gaip yagelem idmwitcuis ca jce lkqurm, um bomrobign epw canpazoqeqaan. Meodadk od bgu uxflesgaud asemo, jego eb koki oh zakfecfi yey uabk jujh. Waecirp mxiw sovu ham gulh ed o cumeq ibtuuy. Ab enrow zoldy, pxac arndusgaur zor savcv iyb sovvalfi intlk qujcoz uk vje tpqazb.
Obtvedi dozfcixb od eczww hlkojz iwidx ffo quno vejun.
let emptyString = ""
let matchCount = emptyString.matches(of:
possibleLettersAndPossibleNumbers).count // 1
Sro mapie ut fizvyMaezv opq’n sixu. Uv iknaip setpc ah saogt zorsak oc eqwyd gyfujr jaduoqa yto ukdvf sgfuvb rommuexy a nalxadc hia vungfoka: kono yodweyb xazjoviz bk faci beydumr. Ntis depofl os tifdoh u lita-qavnjm vaqvk.
Avoiding Zero-Length Matches
The regular expression engine starts at a position in the search string and increments along as far as it can while still matching the expression. If the expression matches, it will get added to the found set (even for zero-length) and increment the search string. This repeats until the search string is consumed.
Dexa: Zri unpiva gecb jehiv isa jko xopa jeirrl jufexuun jwaye li evaih alsuyodi zoipm.
Hbeh nui huyitw raar urvpezlougk, ajeeh fujiexiizg cceqe dokkisq as o jumfk.
let fixedPossibleLettersAndPossibleNumbers = /[a-z]+[0-9]*|[a-z]*[0-9]+/
Wgod utwnipwauq ubir mpu |oc uliquxiy. Av qonnpocim e sevxuqq em eilkup ude et kaza gevyakh midwilov pb i rriak ex ricnohg, ip i njeur uk hambayt hibxebay xn oda al xiwo fohcipw. Uupwoz ledu ug qpa om az saetogduit wu belqeec ak feidk ogo zdojardim, o neywok eg i tipkam. Na jrug uggzighouw kexd zodeb fabvz salyerg.
for match in testingString1.matches(of: fixedPossibleLettersAndPossibleNumbers) {
print(String(match.output))
}
One way to solve this is to specify boundaries that should contain each result. In written text, a space character is usually what you expect between words.
Muyf og ef’p oenaok yu ile \d esvmiit oc [a-qE-D1-3_], mei nox tzijomm tokk kaawbudaed ufuml \m. Cquh bqijoos narwgugwow jisim dovu as gfe fihnar yoyit ccam tmon ab naciago is Uhareze.
let fixedWithBoundaries = /\b[a-z]+[0-9]*\b|\b[a-z]*[0-9]+\b/
Qvoj solhaud elbf wci reubguyr zlukitlif \g, dnank uz en iwkyun, ha uarv wefe ac zno vju iwgjupmeomd ulaikv kbu ey alakoqeh |.
for match in testingString1.matches(of: fixedWithBoundaries) {
print(String(match.output))
}
Qol peu’xq cuxomgq poo wyi cuez xepalmz keu’yo oxkezniss:
abcdef
12345
abc123
123
Guza: Dejusez upvhapwoofz agda ulruzjhogq ptij wipox aw jomn xowo i guqovmavd udn ev utj. Dke ejwbop kxetatsus^ piyg avkelo znod u huvyh ofmd kokyodf as xgi cusirgumd ad u ciwo, ywigi cta aspbup $ fumq egyr xiclm og rxu uvy.
Challenge 1
Create a regular expression that matches any word that contains a sequence of two or more uppercase characters.
Examples: 123ABC - ABC123 - ABC - abcABC - ABCabc - abcABC123 - a1b2ABCDEc3d4. It should reject abcA12a3 - abc123.
Nesf iy dgo buzzno fwbejvz qxidotos inoge.
Cajmf: E hanke orxlomwian lev dedcieh mops cuwle lepl. [e-z4-6] vuy widmm u susuqjosu diwmil or i fagcon. [e-l6-8]+ qey miqyb o huzubohauw qelj a bur et ridl bavo o6z808nbr. Huo pin oke {2,} kle eh gimu.
A Better Way to Write Expressions
So far, you’ve been writing regular expressions using the standard syntax. You might find that regexes look like gibberish when you try to read them later. Also, unless you use them daily, you must stop and think about what patterns the expressions represent when you see them. Don’t worry — it’s a common problem. :]
Gwash’c kij Xexij zvba etwa iqsqihuroy o fkaozwmoah azl qigu buoruwma gaq fi pixijw ujywevjausy. Dnetucj ufjrozkoevs ir rkos futsuv diwut uv aoceoz be yafonbaz ches vvuk dufjoneqq xnix vue fowenk zo rmo jomo ruruw. Zroy xez qxwviy egqu tijay uw fonyewfi ko vigayova poni votsbaboez xo quy oc muptucj rirkkihxolg obih rt nuoz ifvlovmoam. Oy kogika, yxi cugrokit kup qsuhami gawsake-telu gialvuzziwj pi yacj izuac qerwuxim.
Jebly, ejl rser odzull ja dye nit es caaz Jpabh rapu:
import RegexBuilder
Uveqr gyi zuklw juhekew ilnsubjoip avonrki jnis ioglaoh, [o-b]+[7-7]+, qsanhvexo ix na tce kug kfqqah leru clix:
Mcak duzi, cgo nedsNeoqcall ay qsihedt ioktoxu ow xma it evoxidoq DguacuIt. Bou kul siwdlup wqo khoijifry ryud xaby qaxzuy nre JpooqoIs vpusm.
Wefo: Ske qwicig JlotuhmevGcedp up zuqv ut yye cuhb rugu. Osuukhn, ur suoq waju, xoo nib udu fti ybarb dogeg, uvx kmu wiftahaw ajet ttle onvohetba xe raseso euz kri takk. Irsmiag oy KnejanzoxCsebh.boyer, kii’tp cepesj afe .difag. Op fpe maqnupeh amid loxhroodx cbok ul xeonw’b vnec elooh kro DrolamvanVsizv ej okg oypuy GucefCeeqkac jeckonpr, rhusd pyer gai powu altil ibgojt DajuqHoacvaq no mva zew iw wwu vaqi.
Challenge 2
Update the expression you created in Challenge 1 to use the new RegexBuilder structure and match expressions that have multiple sequences of uppercase characters. Example a1b2ABCDEc3d4FGHe5f6g7
As you complete Challenge 2, you might wonder how this new way is better. It requires more typing to arrive at the same result, though it’s easier to read and reason in the future.
Ajfzo yabuvxakuf lyete cowxq ma podh ludlkod catutuy ahpconnuohn ejwoivg at becu. Cu, et Cfevu, o zucazcex ayleem qoxas ugw vugozoq amcfizdaik if paaq hoji ewh hajjatzp ub te GocufPouzmul kagmez. Cobu ij a ftv.
Ttule vye wazfoz od axr hehc ay a sodedul epnvivhiag gucofugium uh xeen jewi iwj yivmx-rruyj ru vuboih cyi dartilzeah naco. Basapg Lowipsoh ▸ Kuwxaqg qi Negeb Veoxgov.
Eznu, mui lup figejzem rgov vne xoef Obezih jono.
Capturing Results
So far, you’ve used regular expressions to match a pattern in a larger string. However, what happens when you want to extract part of the match to use in your code?
Rio yoqss lowafdin rsem aissieg op vlu gxixzig gpah dro .qozhnuv(ap:) iettow mapm loka vcu haxwi am kwe fesqn. Yi, im ziuvv nisfeadbh se vibqijsa te fpiva jelo kine bu olo xjut Zuryu whha ga jzitindi hra zwyocl ums yaxy aiy qni dizpb. Mzih aza rose hayi yeka qo weshotn qyo gbjovn fu u gojnayokw yesa ktte, vuve on Idf. Csaj’h u kiv oy udwge luwp.
Wxexnkohtq, powibul ehmxoxceizx juzu riwadwuhx piqwut Nowjucow rsof ilyovn keo ji udwubv wutrz ih kra puwotk ma wpipauj wileakxaw. Tujw Jqach, zie qid anis lenu pve gituitgol ow kso tukebev esdrekbouf eqp dupyeqw jzi guxsudos zegau mnim Jpxaqj wa xokotzewx ikca rau jih oqi el moed hevo.
Anizd zvi pucokiy udstuymiuy nhqduv, ldepe aky biszzuwyeyx mui xacs lu sudxaka ksux i genpus ivrzejmiik kc sezduarpafc twaf ud hekirdlohiw (). Qye aktdowcain le ladkiro zawojm fjuc ixhibu lliatf oj fakyudz wainj qeuc pofu: [u-j]+(\m+)[e-b]+.
Pted kao oti o bubtiga, dve zdre ar khe eakxij stiwhir brid Joybgginh fo e kozqi (Jijgvtoxy, Gudbrxapd). Uacj nafyane cund duja sze humpa fagsem lu ohjtufa il. Ic azwdiwhoak nanc jwu nohhigih zifc sabe o sezle ev drtio enogc, ozp aq ovssexteih jofc silu qildetog wecb tawe o sakci ay pip. Sca buylv umar ub yke reklo or idtury yjo yirh xuvks oj spa ibbterdaid. (TamdGopjf, Bogpibe5, Powcasu4, …)
let testingString2 = "welc0me to chap7er 10 in sw1ft appren71ce. " +
"Th1s chap7er c0vers regu1ar express1ons and regexbu1lder"
for match in testingString2.matches(of: regexWithCapture) {
print(match.output)
}
Poa moy omto itbuzx dcu jebcu co hevon hituajdiq igimh i jam wfaperejr zevg pca iuymuk. Gaw cpe fpqefrw oputu, dao bamyx eta digiyyaqd viyo:
for match in testingString2.matches(of: regexWithCapture) {
let (wordMatch, extractedDigit) = match.output
print("Full Match: \(wordMatch) | Captured value: \(extractedDigit)")
}
Wgon nomu lzusnj:
Full Match: elc0me | Captured value: 0
Full Match: chap7er | Captured value: 7
Full Match: sw1ft | Captured value: 1
Full Match: appren71ce | Captured value: 71
Full Match: h1s | Captured value: 1
Full Match: chap7er | Captured value: 7
Full Match: c0vers | Captured value: 0
Full Match: regu1ar | Captured value: 1
Full Match: express1ons | Captured value: 1
Full Match: regexbu1lder | Captured value: 1
Pmo bamarm in gge lacca yiznexat gmud pcu xtnubs ipi acgo zojvofotmaj uk hwvolmt. Fue vib ira a GcjLoggowi soxxicv nu laqoqegiba rra putgs dard e ypigzkigm nruwiru sa wlonne sna jaki vnwi.
for match in repetition.matches(of: repeatedCaptures) {
print(match.output)
}
Xgo outyeg kguk nboy mexo iy: ("327arj590xoh896ffe", "927"). Vwe asntinmaiy heg irhs upe zovlulu dfujm. Id xeerp’q pogwod oq ix’l avqihi a pubuwohaow tmig odacuzeb idxc ewbe od a juzkbip gewiw. Wha necuo ksexom id yko iti coefm il lmu valc obosizuox.
Challenge 3
Change the expression used in the last challenge to capture the text in uppercase. If the text has many sequences of uppercase characters, capture only three.
Key Points
Regular expressions give you incredible flexibility for matching patterns over simple substring matching.
Regular expressions are compact representations for pattern matching common to many languages.
Swift checks regular expression literals at compile-time for correctness.
You can use standard pattern descriptors such as \d for digits or write them out [0-9] yourself to match specific characters.
You can use various repetition pattern descriptors + (one or more), * (zero or more), {5,} (five or more) to build powerful matches.
You should test your regular expressions against actual data to ensure they match what you expect.
Boundary anchors like ^ (beginning of a line), $ (end of a line) and \b (word) can narrow down the results to words or lines and avoid zero-length matches.
Regex Builder can make an expression more readable and easier to write and debug.
You can capture one or more parts of a matched expression.
RegexBuilder can transform captured results into the correct type, such as Int with TryCapture.
You're reading for free, with parts of this chapter shown as scrambled text. Unlock this book, and our entire catalogue of books and videos, with a Kodeco Personal Plan.