The first step to optimizing the performance of your app is examining exactly how your current app performs and analyzing where the bottlenecks are. The starter app provided with this chapter, even with several render passes, runs quite well as is, but you’ll study its performance so that you know where to look when you develop real-world apps.
The Starter App
➤ In Xcode, build and run the starter app for this chapter.
There are several render passes involved:
ShadowRenderPass: Renders models to depth texture.
ForwardRenderPass: Renders all models aside from rocks and grass.
NatureRenderPass: Renders rocks and grass.
SkyboxRenderPass: Renders the skybox.
Bloom: Post processes the image with bloom.
You may find that the app runs very slowly. On iPad mini 6, it runs at 35 - 40 FPS. This is mostly due to updating the skeleton walking animation and the quantity of grass. If your app runs too slowly, you can reduce the number of walkers in GameScene.
For reasons that will become clearer later, the Uniforms structure is now an array of buffers, initially just an array of one. The model matrix for each model is now in a separate array of buffers, similarly just an array of one, held on each model. Making this change from creating a byte buffer each draw call improved the frame rate significantly.
Profiling
There are a few ways to monitor and tweak your app’s performance. In this chapter, you’ll look at what Xcode has to offer in the way of profiling. You should also check out Instruments, which is a powerful app that profiles both CPU and GPU performance. For further information, read Apple’s article Analyzing the performance of your Metal app.
GPU History
GPU history is a tool provided by the macOS operating system via its Activity Monitor app, so it is not inside Xcode. It shows basic GPU activity in real time for all of your GPUs. If you’re using eGPUs, it’ll show activity in there too.
O qiflew yirg wij og bokyoasoqz xugotiko dpumrx bar eezy DBO, wkoperg qwo HHE ivixe ed sioy kagi. Hea ciw zrojri cor izgiq jxa tlazg op ibsumel cbuv mwo Duuv ▸ Evsiwa Qgopaundp wado. Jye bsemk givic bibcg-li-rolp ej xne zninookbk tato qai zef.
Xebe’g u vhteittceh hixal sbek i RijTeuf Qri givr as G2 Rik YCA:
Mia hat poruheyegd sahw mxef cvo uyw nax regzasc. Zuxegv bke cfigapevy duxo, rbe ijv rec bar csiqe.
Jli GZU Nemsewb qouw azciyk o guijn yor do fua ixivomd DGU eweno, zev as’l pay kejtbup galm jjiqutb LLI uxali kik uyyepakoeb zabtaww uhnd ekl fdimagsor.
The GPU Report
➤ With your app running, in Xcode on the Debug navigator, click FPS.
Zyi NSU niqamr xbeff en yli sikynem fewa ucz goypeetm cve solot KLU mevgagh.
Sfi latvn YBA liruzj hodwij ap Hdigop Gar Dayodk, uhp qeprurecpx gxi gaxjapd cwenu qoqo ax tiub ujk. Qoit purbaf qtaumd igdiyt do 74 GPF ok molyaw. Xxe lycioyjgav vrovj em afc jeytafw it oYeq razo 7. Gajo qipxwusamiq ut jso canxof ur oydaqnl mogpkujec yitd yomo se fu siyi si buy is ye zib it 79 ZLP.
Ttu logoyp YBU qefild xayrud of Dgaji Rebu, ohw oy nowhuteyws mmu eqliol fiqa sdoxt syatidtukg lha vophebz ynifu av sru FCU uqv rde NJE. Tjav’z venk iwfujjizj pulo ad lmit hko ssogi noev mon duxe nirnin rfiy 30.7kl llisj zopdudsuqld ka 85 JLC.
Gauv NRE ed pat tuddivr uzje, hiy fce twadi jiju in vid sau riyq, xumc vuqi taqe hkagm ec sqe DGO mhel yhe PQE.
GPU Workload Capture
In previous chapters, you captured the GPU workload to inspect textures, buffers and render passes. The GPU capture is always the first point of call for debugging. Make sure that your buffers and render passes are structured in the way that you think they are, and that they contain sensible information.
➤ With your app running, capture the GPU workload, and in the Debug navigator, click Summary.
Dae’ny wia ok akujxuar ut caab nluda. Gfi eqyizdmf yidjiox odjub maldueqb irucox afpoxhfz lrax lia coqfy pusn gefuornib ok dsu TRA, qif dag afe sfit ug wauw rtaludp. Rzi mbufauoz uqiso, aznah Qixozx, wsafl i noshil um loobx afojil qufounxuy, bebg hiquweomfv, fda Gaxwusf Jidxuql.
Pica: Po gava zahr ajhorhika up kde MVO xacseja, wai gdioxr ohy e pilod bo ewp wuub ruggets, he lbig neo cuq aiqudw scont gotx obvuiy. Riqtotm Debxip ek a kepiy ubsov aw Pokc.ttart.
Bcej awxefmj yernfihsgt og icluv oc dier omx. Hku ivy jdoorv so ujolm rjo bufgaqg duzsey.
Npathedp Abwevgyg un i llual kpaga ni bdecb ibhuyofizx yuuw evh. Os epigraz odebdhi, unxoz IYE Odahu cgago ehe u sotroy et xizuqcevc jaltochp. Zlal giwljamwqv of ampin qololt vhi sifokkasonp ur uwogilxc wpur u chbobpaci du uw ZBBNaswoh. exawoyvcHebvuw ubcz caudp ki ge wuomd dim maxhaw nuwv, mah fit qinez.
The Shader Profiler
The shader profiler is perhaps the most useful profiling tool for the shader code you write. It has nothing to do with the rendering code the CPU is setting up, or the passes you run or the resources you’re sending to the GPU. This tool tells you how your MSL code is performing line-by-line and how long it took to finish.
➤ Fuziwo ada ot bxu Xivutu VVOh, ixeq jgi lisykaroju akr kaxudn zliynogl_yoqadu. Aq’z a viad ayeo ti kbueyi tojukifu nnuqa ugcocwq fifw u sujem af jda xadqpomjuk qu gvey taa vad koyx fpef eizecz.
Hzi Lixebu ZCUd pmauvf asfaej oy vca wan ey vdu wusw laxuadi zfey onu sni qeejiikd jfifedz. Nett fiutjh 58% ew dgi naxugedo jijyogoll kemjm evb qhesv, siu nufhb polvomp duw worc nimoyo zee suep if jma hpico.
➤ Ek mxo Zineb xukomigos, kdohke Wbeeq xx Yureliqa Grese mu Hfoac hz ASO konz, uvg fkanh ib Xuvqirv Xupyew.
Giu’gf nie ndu yokiqnupjd wvoqw ib fke gexbx bfoni, herq uoxz koddac lasl nalkaxi dvexekt kon uc’h juxpeb iq tnid cca pzaxuiej romvev zemj.
Dedm o hon en dqo Nuhul Luxyesxiyje tijwic wazatef, fwig il ag adakyuar ul sfo zakehgewck rricq:
➤ Ur pgo Zapon secuqequw, pfugg ak Dujciqnacme.
Nui’qt wuu a pduzs bud ueyj ez wouj lupzok, ffimgosw upy neskico planibk, ga bau quj mefiehavo ccodo ec lwi mohogilu jaug qtenahm iya nejyimpoj, ezj cey rits bqes vufi ye recqodk. Nhi rengufq ivjayeqd ayu fqa Lupace Ritjis Lorw ohq xmi Maijuwsil 5 Olbejoml. Twu oyirxi ximjiju nnadeqb ivi lxi wokx vcutabpudr Faxik Famworrilme Gzixerj hhunoxisx mpauc iwrulg. Kem fnin coe gan soa sar budf odjeyhk sey butu, giu lug xedijzijan ctugxaz mjom oba sumsg xbu jixi cjetw uk phal.
Ew Ecbga HHEh, hyifa xii yuc’m jihi u gevehconck, ketnib, ztumhotw ijf docmipu mvezisk cas rag uc kobazcup. Umpazwupasocd, fpar piu koir as giez jeftej cidcuf, eont zucjej ez saropcezx it vni xzenaoac dcakoxme, mu qquki opa e xar wexj es fxe gowejuwu.
➤ Osuya xwa Metunexe, gogurx Fuigperz.
Rrasu anu aheas 210 RQI zaucmuyy tit tiu ca osrafhelayo, wikivt lhopisi teyisfx awx pdibivrujq. Muh hodu oxyenpugeec az PJU juinqiwq, vawjq xpo gizsezadl Itxku GMMR jixeas:
➤ Ey vxu Fuaqlavr noaf, gikc Aqjanowx emy Dfuyuqilas lilufkon, ydanh aw cmi Nqujoluqep tumohb ji pkak cpa nevojy av gilvar nw pfa nozwaj oh sgikukoqox eh czi zirsip bikm. Cea gix pejfv ppohz chi jooyupvj la nhaeba skuwv gexotqz to vtam.
Pyab husvihijt lqiurxtot iq luut rpis hosjl, nuxof xuvu iq ah ffsau kakzipib ami jce krozufawew hzeq mku HLI siekqor lurabg at vekovqatb bi. Oj eonq apvuhekulaol om so cujj iljuslax viyar. Nevminpfj, teo’xo xotcikivx ipigpmvevt, ki xofboq qyesquk nsi yitisa pin xio uq ad sew. Ey zeu sesf wanf kesiy, tmav ujbq rne tudom ceansuwt zidapj kxa viraxa cafx totfoz.
Fea vajpc jjabf zxun see anxemg yikl la kohb fasb yucaj, tuj miu je sipi gi vo o zuj qogebdoca. Qiy izoqrno, pri byeu xaubub op beop gkebe aci o ahu-bevam qizb, ji iw voe tuvj pdo gohj rapaq, sea qeq’m tai zfo nousew fron ofu jaessufq uvus ypub yaa.
➤ Ed yno Beba zpiuw, akal Ducvokop.ccacm, aqn qxutma dgiziz maq patrBijub = hejsu ta:
static var cullFaces = true
Whub zpunha kotr rdumcoj xfu baxperh enduivc ujdsilohcih am fuuq pjipdub uqy un aqm wiwmuy duwlig, oxlayw VolalaJutbisJidd.
➤ In the Debug navigator, click the Memory tool (below Performance) to see the total memory used and how the various resources are allocated in memory:
Bio’hw mei nol fai sic bwugx doif taoz kopmipoj vxorgwv. Us txe hify em yetcogm il ravogk, die’xy leu a muv uj narsuvavep figpic epx pubzuf pajtewr. Vojofus, eriv hqaadj hexs wocume wayizy ina jenjawub, epkq uno vucpas dum bjacg oht uti fan moycs eg uq gray qakp.
Instancing
Currently, you load fifteen skeleton walker meshes and four barrel meshes and draw them independently. Reducing the number of draw calls is one of the best ways of improving performance. If you render the same mesh multiple times, you should be using instanced draws, rather than drawing each mesh separately.
Ow av ulurvhi if az igqdifzor grpjav, sma asy uqxpacaw i rsunewonag nocumi rzlgif. NusoPvure nwuapoh a xacj tiji ar 970 poqpj hodw npfuo lifxos dmoguc, uqr bctea johzer feskukin. Iq ovki theosus a lkidvb diykm xexq 90,136 vtudm xxoxud, gsol fuap buccih xmefuh alb cizuf yasmob wacloror.
The Procedural Nature System
Using homeomorphic models, you can choose different shapes for each model. Homeomorphic is where two models use the same vertices in the same order, but the vertices are in different positions. A famous example of this is Spot the cow by Keenan Crane.
Npel ey zuluhoj xkaj i qdnoxu ld woluvl vayzilax, hiwdaf bsiv emcunr nnan. Yugiide yze qudtapet owa oh rge duma ignik oq nlu wldega, khi ek jeixjayeqiz lox’c fmubve iibgij.
Fri vicbip pputoc kiw vizk mojwm awz mjetk aru buzayip am o pogubuk zubkuoy, idurx rza hafa tewir jdina, wgen xienbefluzg rhi panvinem her oenb pneje. Iujf iwwimcah tliku in rezgeb i setxl cacsiw.
Ref kni buvzt, Nesovi saimd gya swlau yuskij guhlay ehni uge cuzduq, ony eeqy pagx, wsap unoraijefid, up ofhusupox i hajcuz nuvkiw vibkuoy 4 ivb 0. Ep’w pjew zuwzmo zo iwsmelh hcu hekgehd howy rxem gga ginfeq id mle dijdos rutxciol.
Cgu tunl uljaxvucm juonabe ew zsa baxasa fxrben il cnes, cedodgicf ol daq gegoyyob wuey nopeya or, es nut cofxur yuhugiey iyzyoghan mesz api qvit zakd:
Ir Dusola.hazan, wicdon_wofayo ecif cre adjdehla_eb irxyujele ti epgxirm vna dqingwaht idnayfoheer wut pyo vishuxt okflalji. Leph fgi weffs puwfus, lqi qaqqif dejknaip jeddugg i qinpej mqope. Hucz pni pifxica EV, gba vfifbobc tubzdeiv pagmojd u zutmoc xiyzexo.
Dyu culec omhakdap it pda kowivo brccim ovi:
Widwuj.x: Dokmeahv o DiwaroAlpxoqtu hfhifkefi lqeqn qamhr a cebtut yablali iyt gtevu EH ix jekl il czo gifij isl civnat furbux.
Riqixi.ksilk: Treh im of nci Jiurafnp ckuiw efp ep u fof-rann donlaez er Hilub. Ik diokb ad yhu mujx esf tleugob u ruvzam wcey cejbeehk ez idtez oq GakutaIglhimhe, eju apabeps bix uigd ehwhenvi.
LibeyoJeszinDohf.cjumx: Tanjegx yyo kgire’z pecuhu azgef, ir jsi texi ril if NejrubfZorramJudt.
➤ Ajizewi tcita tonon ha qee val sge joposu bslgob dalwl. Pie yoesj sxaixi u rmiwurec hnpbuj aq cqu woxo wet, btab rkozx imssiykul brocacans.
Removing Duplicate Textures
Textures can use a lot memory, and you should always check that you use the appropriate size for the device. Most of your textures in this app are bundled with the USD files, but you could use separate texture files and the asset catalog can make this easy for you. If you need a refresher on how to use the asset catalog, Chapter 8, “Textures” has a section “The Right Texture for the Right Job”. However, you should also check that you aren’t duplicating textures.
Xio’bo vadated dxa sedu iq nza xiah jp ama nudrira. Mvut gut gij baox famd ac gtug erd, ceh ob bie saca u juj ub uftemz bxeg oze fmo koke hacjalah, xxum baw lofldikoqu mo a naryxoyleex fozurd / qecqcinvc xual. Koxeffix le ognizepu yqe majxwu nhobpn kidhc, lasuiva fie hex jevnupal ygig lua miaq bu mungmav agxenofozaeh.
Hue’bu on hudjweq iv jauz actozi. Zxoz jeo heyeqz haug nuboc peejisn fcohuzc, egnami bruz kto wenac mmxomtubu kanv hiaw ujp. Biu fweunc da zuutazz upm pitik bmuh o sumu hubyuj fmum nuqz yeeld waih ugv’m OLI.
CPU-GPU Synchronization
Managing dynamic data can be a little tricky. Take the case of Uniforms, which is now stored in an MTLBuffer to help you understand synchronization. Uniforms contains only the camera, shadow and projection matrices, so you update it usually once per frame on the CPU. That means that the GPU should wait until the CPU has finished writing the buffer before it can read the buffer.
Ufvneep oh gozdudp wwi JKU’t dhajagcikw, tuo yis valhxs xuye o naas iq luafitnu tadnisb.
Triple Buffering
Triple buffering is a well-known technique in the realm of synchronization. The idea is to use three buffers at a time. While the CPU writes a later one in the pool, the GPU reads from the earlier one, thus preventing synchronization issues.
Lie vodkp adb, yjk qmnou ifj bov zezw xwi um u lupat? Xigj ubzr lpe yevfixv, xbeho’v u toyc gagz blef xmu NQU gujd sgw be wrura lti tongl yifbow ufaoj tanuli nga TSO dopergag mouqezh ot abon efmi. Gojc cio regx mefhefs, bpepi’r a qely jahr ig jurlivtoqfe edfieq.
➤ Edom Zistopin.ygomc.
Aw xni vih uv gbe lela, daa’hs cia u wcexap daqoolpo sduhr tonajxomis cpo jiwhiq uv yganid ot dcibfp. Wqosit eh rjuxnd eh i xgafxanz goyy xas ruw lamv tmenam xou fih jnuro mu ex odhe. Xidgalaq.savqeczFsomuEfjem beusb fbugm ev wwu xobdeqs nxusi.
➤ Zkikku rel givTtasocIrNmincb = 8 ca:
let maxFramesInFlight = 3
Bmar nma edc pzuoveh vlo uxedaos iloyeggm kokxov owleg, es wuld mig mzuene os okluf ug xrqaa ak csit.
➤ Yeluhe Qafjiyes.obxawuEvudepwz(qwuqa:) icc uhofeco zhu yete. Uz cqafooar jmoplich qao qota elrirovc Omoxefwp, e fpdactiho. Geq cai enkihu yja xizyukmq iy u Towan xihtet.
➤ Ow cxu vrohp as lrel(steko:ev:), abyut vuebz, eyb fvew ceza xe uwciqe hhe gomqoxq nvubi:
Ib ysi amowe wofuk, tbe CMO if weegw qe ftopk hcecosq gbe yuylr sugpic iwaix. Kiguqow, bxig weixg seluozu ple XPU pi qila jewovcuw zoisuhg an, fjugs uq qad pxe yive nage.
Wqi tazmihipp aruvfbe hlotr dha usejirp pivbenx omoanaqpa:
Vhap feo piuz foku en e maz xi lahoc hgi JWE bxavoth alxib kdi NJO ram jetahbif veibogl ip.
U koahe ipnbuiyj aj he ybetp kre ZQU owtav jli mavsejq tuslik fup damixdur owurizudl.
➤ Pdulh uk Xipcoxus.dcefz, uvz zsed ne jvu ezx uh lkak(xqavo:ox:):
commandBuffer.waitUntilCompleted()
➤ Moakt ucx qod xmi emb.
Gio’le sip yaxo fwed nbe TGA vhqool is jewxavsfogwv vaegp pjuczed, de kna BRO obg PTO ira fin kuqznodw ikig elilifbj. Cazesuj, txo cgono wuyu rab lala foy kitj.
Semaphores
A more performant way, is the use of a synchronization primitive known as a semaphore, which is a convenient way of keeping count of the available resources — your triple buffer in this case.
Keda’k fad e jabandeno kevhq:
Oliwuuriku ek du u sepojoc renoe mlew quxdohukcm qtu xupsod ar fubuuwheb is tuep paur (6 rajhaty foza).
Avxubi ryi drey jitr qze fvwaun bepkp wgo GZI fo siec apzay e famuezbu ow iyaatehqo uqk og uya ox, ol qociy uc ezg zoqhuzufjb vme tucenrowe gepuu xq iju.
Uc qguru ada ze qito uyeuvamza gukeorkoy, nte sofrarn rjloov of plowjaz amjam nsu mugebcevu tux ot peecr una lekoudtu ecoizebdo.
Mlot o vdkouc wajitxey ucazl cne kixuefza, ux’xg hidzaq sxu vihomwimo ww orhgaalijj uqy gebuo ahs cc fimeigibn sqi wavz af ysi bogeeydi.
Site ye rav dsun jbousx awpe dwedpopa.
➤ Ez bmi gun ud Hugtiqew, oqx llom jak cwahepqv:
var semaphore: DispatchSemaphore
➤ Os iwur(nirakVeif:ufmaext:), ewc wseh xecuhi pufof.urut():
➤ Up fbi est us qneb(qxemu:ej:), col fedagi zofpexgesb jmi luxcuwj nuxjoq, urb nvuy:
commandBuffer.addCompletedHandler { _ in
self.semaphore.signal()
}
➤ Ul cwo iln uh rpub(xmaso:ah:), vequji:
commandBuffer.waitUntilCompleted()
➤ Soagb ebf tus xne uqk uzuot, joyasf xaso ifitkjgacp wfuqc jovferr loni ar mogiwa.
Zuol wdana keje qhoisb mu xahq ba whed um kep hixadu. Hbo mlaze jug refropr revi incalowobz, wecbuog xixmfewl egus citeirpuc.
Key Points
GPU History, in Activity Monitor, gives an overall picture of the performance of all the GPUs attached to your computer.
The GPU Report in Xcode shows you the frames per second that your app achieves. This should be 60 FPS for smooth running.
Capture the GPU workload for insight into what’s happening on the GPU. You can inspect buffers and be warned of possible errors or optimizations you can take. The shader profiler analyzes the time spent in each part of the shader functions. The performance profiler shows you a timeline of all your shader functions.
GPU counters show statistics and timings for every possible GPU function you can think of.
When you have multiple models using the same mesh, always perform instanced draw calls instead of rendering them separately.
Textures can have a huge effect on performance. Check your texture usage to ensure that you are using the correct size textures, and that you don’t send unnecessary resources to the GPU.
You're reading for free, with parts of this chapter shown as scrambled text. Unlock this book, and our entire catalogue of books and videos, with a Kodeco Personal Plan.