Saturday, February 28, 2009

tac: Converging on a graphics solution

After weeks of experimentation, I think I've converged on a best solution (for now): Most of the game will be drawn with opaque triangles (fast), which should work great for most map-elements that have simple shape (such as floor tiles, walls, boxes, hedges, etc.). Then in the second pass, we'll draw alpha-masked sprites using alpha testing. The number of alpha-masked sprites should be minimal (such as units, enemies, and some decorations) to maintain 30+ FPS. We'll need depth testing to make this 2-pass drawing work, so no need to sort on my own.

Another detail, thanks to some Apple developer forum folks: Put opaque and alpha sprites in different atlases, and minimize their size. Opaque textures don't need alpha channel!

With the opaque sprites, we can probably afford more than 2 triangles per sprite. This will allow for more interesting shapes than quadrangles. Of course, at a certain point, it'll be cheaper to use alpha-masking than a ton of triangles. I should probably write a tool to do mesh simplification in 2D (start with a highly detailed triangulation where each pixel is triangulated), with tweakable thresholds, and then let artists tweak the 2D triangulation manually.

Spatial partitioning will still be necessary to cull the number of vertices we send. This affects FPS greatly.

All this is what the tech note suggests, and it sounds like my only real option. There just doesn't seem to be a fast way to raster tons of alpha-masked sprites using OpenGL ES.

tac: Speed progress

OK, after running some tests, here are some conclusions:

- Culling off-screen triangles is important. There's a 20FPS difference between drawing 15,000 off-screen vertices and 5,400 - the latter should be plenty for visible tiles, so I need to do my own spatial culling.
- Tight-fitting my geometry around the sprite (ie. minimizing number of 0-alpha pixels I draw) helps a lot as well. I should auto-fit geometry in my atlas tool. But this still might not be fast enough, if the screen is full of tall tacos, in which case FPS drops to 20 (alpha testing + zbuffer).
- Just using blending with tight-fitting gives me 40FPS with 30x30! Drawing all tall tacos drops it to 24FPS, which is not bad at all.
- Disabling alpha blending AND testing gives me a huge boost - to 45-50FPS even with tall tacos, and it's not really fill-limited at all. With 15,000+ vertices, it keeps a steady 30FPS.

That last one is the kicker. Basically, I can potentially triangulate all of my sprites, to make them perfectly fit, and disable alpha testing/blending completely. I'll still need culling to avoid sending too many triangles, since each "sprite" might use a lot. I'll need to somehow manage the shuffling of all those vertices as well, to keep everything in one draw call with the spatial culling.

But, blending+sorting might be fast enough and yields the best visual quality. It's totally fill-rate limited with no visibility culling. While my test map yields 20FPS worst case, real maps could have much more overlap, causing more fill. There are some details I need to sort out (no pun intended) with between-tile animation, but I think I have a good idea for that (the problem is what order I draw things in - left to right, top to bottom, etc.). Spatial culling is still necessary, since off-screen triangles do still affect FPS. I could do some of my own visibility culling as well, and assuming the map isn't swiss cheese, it should reduce over-draw.

Looks like I'll need spatial culling no matter what I decide to do. Basically, I just need a few redundant copies of all the data in separate draw lists, and just draw the ones that are currently visible (easy). Hopefully I won't need more than 4 draw calls at a time. Duplicating the data isn't a huge deal - display lists won't take more than 2 MB each.

Triangulated-sprites could be an interesting approach, but I'd need to limit the shape-detail of each sprite. This is fine for things like walls and pillars, but for characters with complex silhouettes, this could be limiting. But then again, if the artist embraces it, it could make for a unique visual style.

There's still one thing I must try before deciding between alpha-sprites or triangulated-sprites: Texture compression. Tri-stripping is probably not worth trying, and glDrawTex is unlikely to be fast at all...maybe I'll try them when I'm bored out of my mind with everything else.

Friday, February 27, 2009

tac: OK, I lied.

Must do more graphics optimization! Lol. After more reading, it turns out that drawing directly to the frame buffer is impossible with the official iPhone SDK. Other software rendered games have encountered this problem, and can only achieve smooth FPS by using jail broken APIs (http://blog.steventroughtonsmith.com/2008/07/whither-coresurface.html).

So, back to optimizing OpenGL drawing. I did some really quick manual benchmarks with the 50x50 map, so about 5,000 triangles with 15,000 unique vertices:

20FPS with fullscreen fill, but only a small fraction of the tris are visible <--- most common usage scenario 17FPS with fullscreen fill, about half tris are visible 19FPS with halfscreen fill, all tris visible 40FPS with noscreen fill, no tris visible I'm not sure why I didn't see this before, but that last benchmark is alarming. Basically, just the act of sending 5k triangles and culling them reduces me from 60 to 40FPS! Clearly, there is a penalty for drawing stuff that won't show up. So maybe doing my own culling will help after all. But I did try this before, so I'm not sure...I'll have to try it again - I was probably using different parameters and what not. Other things to try:

- Using TRIANGLE_STRIP instead of TRIANGLES (unlikely to make a difference)
- glDrawTex (can't see this being fast at all...)
- Toggling alpha blend/test
- Tightening quads to avoid 0-alpha fragments (definitely must do this)
- Spatial partitioning to avoid sending off-screen sprites
- Texture compression

tac: 2D??

I was talking to a friend of mine about this project, and I mentioned the performance issues. His basic reply was, "I'm no programmer, but shouldn't the iPhone be able to handle a 15 year old 2D game with no problems?" I was almost going to start explaining why OpenGL was limited and why I'm using it and etc. etc. etc...but those are just excuses really. He's right. The iPhone should be able to handle something like this, so I must be doing something very wrong.

So I'm going to investigate 2D drawing on the iPhone again. Some experiments I just did with CoreGraphics and Quartz 2D (it's a little confusing what the distinction is really) weren't too promising, but there seem to be more options, such as CA/CGLayer. I'll look into those tomorrow. Surely I can just draw to the frame-buffer directly somehow! If I could do that, then performance shouldn't be an issue at all. Like, how does iBeer achieve its fast 2D animation, not to mention some fluid dynamics driving it? I should be able to do that - and I don't know how right now.

Why was I so tied to the OGL solution? I think I enjoyed the challenge of making something do something  it wasn't meant to do. The possibility of perfect, elegant zooming was also neat. But a single feature isn't worth all the work it would take to get it running fast. Lesson learned.

Nonetheless, I'll probably keep the slow OGL solution in the main branch and continue to add game-play features to that. I have one friend who wants to help design stuff, so the quicker I can get that pipeline together the better. I'll experiment with 2D drawing in a separate prototype.

UPDATE: There's an OpenGL extension supported by iPhone: GL_OES_draw_texture. It's meant for things like drawing font glyphs - just what I need! I knew about point sprites, but this is better than point sprites (you define a sub-region of the texture to draw) since it avoids texture switching. Maybe this is what I should use, since from what I gather, Quartz and Core Graphics are all built on OpenGL anyway. So this may be the lowest level possible.

Monday, February 23, 2009

tac: A screenshot

For archival and reminiscing purposes, I'll start posting screenshots regularly. These certainly aren't shots I'd show people to get them excited for the game, but that's not really the purpose of this blog. So here's the first one:

Things to note: multiple levels and layers are working (this is with z-buffering, but that might change soon). Those X-Com textures, I just cut them out of some X-Com screen shots. The green and blue blocks are purposefully simple, as they're meant to be size/alpha-mask references for real art. It's nothing exciting - hopefully that'll change soon. I'll probably start to post YouTube videos as well, since it's hard to tell much from screen shots for a game like this.

I think posting results to some public format, even if no one is reading my blog right now, is still a good idea. For some reason, seeing your results presented in that format automatically gives you better perspective about the real value of your results. When you're deep in the code and working, your perspective tends to be warped, and what you think is amazing is probably pretty mundane to an outside observer. But put a screen shot on a blog, then your mind starts comparing it to other screen shots you've seen on blogs - such as those for real games - as you start realizing how much work you have left to do. Or not - heh.

Sunday, February 22, 2009

tac: Rethinking...again

Just thought of another drawing scheme that would allow me to use alpha blending and retain cheap flexibility without Z-buffering: The display list (what I'm calling my vertex, texcoord, etc. arrays) has an entry for every position that anything could possibly occupy, and it is sorted (just a one time thing). Empty positions are set to singular triangles that are way off-screen. Off-screen tris don't seem to affect framerate at all, so the rasterization should be as fast as just drawing non-empty positions. Of course, you're now sending off a ton more data - will this slow things down? I'll have to try this tmrw - it's an easy experiment.

I think off-grid animation (when a unit is between two tiles) can be done by temporarily adjusting the source triangle's position, but I'm not sure if this would work in all cases. You may have situations where a unit moves up/down, and ends up obstructing something incorrectly during the move. I'll have to think more about this if the experiment pays off.

In other news, I've decided to use the ProggyClean free font (http://www.proggyfonts.com) for the game's text. Mainly because it's free (no license at all) and it's fixed-with. Font licensing can be damn expensive. Give it up for free stuff from the internet.

Saturday, February 21, 2009

tac: Rethinking some decisions

I did a quick experiment, and I don't think partitioning the map will help rendering speed. Only drawing a 10x10 is only about 30FPS when the screen is near full. When I zoom in on a 50x50, it's about 27FPS. So partitioning might save me 3FPS...not really worth all the pain. Drawing 200 triangles can't be that slow, right...? I can't really do much about the fill-rate when it's zoomed in.

As for Z-buffering, I may have to use it after all. I just need that flexibility, especially for animations that change depth a lot (missile flying through). I may have to sacrifice alpha-blending and use alpha-testing. There are some other things I can try for sorting, such as sorting only indices and using glDrawElements (so I only have to move around the indices). But I'll use the Z-buffer for now.

I'm pretty stuck right now, so I'm gonna take a break from rendering optimization and move on to the UI and gameplay. I've been sketching up some UI schemes (what to click to do what, etc.), and I think it can be made pretty minimal yet not lose the depth of X-Com's controls. Other iPhone apps, like Safari, employ some pretty slick ways of augmenting the touch controls, such as the magnifying glass for text: touch and hold your finger for 0.5 secs on some text, and a little magnifying glass comes up, and then you can drag to move the cursor between letters precisely.  I'd like to do that for toggling map-layers, or maybe some context sensitive menus.

tac: More progress

Atlasing is now working with "map data" (currently a static array in my code), and it can draw tacos and floor tiles (using the same shape geometry, but different alpha masks in the textures). Now, I've got a simple map that I can easily edit. The atlas-building tool is in Matlab - w00t.

I also added pinch-zooming. That was a major pain, since my logic for handling touches was a bit more complicated than necessary. I'm still not quite sure why the first way didn't work, but after hours of debugging, I went to a simpler model, and now it all works well. Some lessons you have to learn over and over: Keep It Simple, Steven.

I'm considering going back to sorting instead of z-buffering. Alpha-blend is just too crucial for the sprites to look good over each other. Sorting could involve some heavy CPU crunching every time something changes. But the good thing is, changes are infrequent time compared to the graphics rate. I only change geometry (easy) when stuff happens (e.g. a wall is destroyed). Although...animations still involve lots of changes, such as a missile flying across the map. Hmm I'll have to think about this a big more. Re-sorting could be expensive since it involves moving a ton of data around, although simple in concept.

Speed is still not satisfactory. I can draw 50x50 at 30FPS, but that's only one layer. I need to draw probably 3x as much for a multi-layer level, with sprites layered over each-other (another issue with sorting). I'll have to consider some partitioning to make things draw faster when zoomed-in.

Next action items:
- Think about sorting vs. z-buffer.
- Think about layering sprites.
- Think about partitioning.

Sunday, February 15, 2009

tac: Graphics performance

Despite my game being 2D, I still want to use OGL to draw the graphics for analog zooming and - maybe - z-buffering (more on that later). The iPhone is powerful for a mobile device, but still pretty weak sauce for 3D graphics and crunching through triangles. How many triangles will I need? It'll be a function of map size. Just eye-balling it, 100x100 square tiles seems to be plenty for a typical X-Com map. Currently, using the array+atlas method, I can get about 15FPS for 100x100. Number of on-screen triangles affects FPS by ~5. Fill does not seem to be an issue. Some potential speed ups:

* Using GLbytes for color: This doesn't even seem to work...we'll save that for later. Disabling color gives me 17FPS.

* Using GLbytes for texcoords: This gets me to 21-24FPS! Cool.

* Using GLbytes for position: Initially, I thought I couldn't do this since my tiles are 5-wide, so I'd need 500 units of precision. However, I can just send in unit-squares (so 100-wide) and transform them with a matrix! Let's try this...hmm didn't help too much. We still get 21-24FPS. Simplifies a lot of the code though, so that's good.

* Using glDrawElements instead of glDrawArrays: I'm not sure if this will save me anything. Using indices (glDrawElements) allows you to re-use vertices, but all my vertices will be different anyway in texcoords (they will only share positions), so using indices will just add the overhead of transfering index-data. Kind of a bummer, cuz the positions get shared a lot obviously. Oh well. NOTE: I tried glDrawElements, and it was slower - about 18FPS from 21-24.

* Interleaving data in arrays: This could buy me a lot by making the data more cache-friendly. Trying now...it doesn't seem to help much. Maybe 1 or 2 FPS without color. Hard to say if this helps much, but I guess I'll keep it - makes code cleaner.

Welp, after all that, I get 40FPS with a 50x50 map. Pretty solid, and pretty flexible as well. If I need to make a tile larger (for a wall or something), I can easily do that. A remaining performance issue is overlayed sprites. Of course, I can group all those in with the main data array - I just need one data array per texture atlas. I may need to do a lot of shuffling when objects move, but that's OK since objects move very rarely. Color modulation also slows things down a lot. Maybe OGL's fixed function lighting is faster? That's an option.

There's the issue of depth-buffering. I can potentially do all sorting myself (not conceptually hard), but that limits how dynamic the geometry can be. For example, if a unit is moving between squares, its z-depth may change and be in-between two layers. Sorting would require everything to be re-sorted just because of that, whereas Z-buffering would handle that alright (the unit's depth will just be the average of the depths it's moving between). The downside of z-buffering, however, is that it limits alpha-blending for multi-layer transparency. I think that's something I can live without though, so I will use Z-buffering. Then for dynamic geometry, I can just modify the vert-position data precisely to animate position.

Next step: make texture atlasing work with the floor data array. Then, integrate TACtical Objects (TACOs) into the data array stuff, so I can start coding gameplay.

Tuesday, February 10, 2009

tac: We begin.

I'll start keeping a development journal for my iPhone game project. It could be useful/interesting.

The game is Wizard Tactics - or just 'tac'. The goal is to make an X-Com clone that captures the freedom and depth of the turn-based classic. Other than that, I don't have any concrete design goals in mind (How long should play sessions be? How newb-friendly?). The iPhone is an ideal platform thanks to its multi-touch interface, since X-Com was completely mouse-driven.

Why wizards? I figure this will give us the most design freedom (spells can do whatever we want) and the least art-cost. For attack-animations, we just need the wizard waving a wand, and we could probably use that for every attack or action. Healing, casting a small spell, casting a big spell, whatever - all the same animation.

Currently, I'm trying to implement the basic isometric-tile system X-Com used for its tactical mode. Some good reference shots are here: http://images.google.com/imgres?imgurl=http://www.ibiblio.org/GameBytes/issue19/greviews/ufo3.gif&imgrefurl=http://www.ibiblio.org/GameBytes/issue19/greviews/ufo.html&usg=__DzQ6_yCXGulw6duthxgnelne3Lk=&h=480&w=640&sz=65&hl=en&start=21&sig2=o0BpFsCu3Rr6_A2mJqoh-Q&tbnid=60vLSJnYz_B3KM:&tbnh=103&tbnw=137&ei=hSiSSaGJCNqhtweKhrTJCw&prev=/images%3Fq%3Dx-com%26start%3D18%26gbv%3D2%26ndsp%3D18%26hl%3Den%26sa%3DN
Speed is an issue. Each tile needs to be drawn with its own texture. There are some large pieces of ground here and there, but even then it could be charred by an explosion or something. So we need to quickly draw hundreds (maybe thousands) of quad-tiles while potentially switching texture for each one. The iPhone is fast, but not that fast for a naive implementation. Currently, my 50x50 map draws at 4FPS - unacceptable.

There's a fairly straightforward solution that will probably work: Put all tile geometry (basically, triangle-pairs arranged in an isometric grid) in a huge vertex array, and have an accompanying texcoord array. Each tile will have its own unique texcoords. Then use one giant texture (an 'atlas') to store all tile-textures, so if a tile needs to have a certain texture, just find it in the atlas and move its texcoords over to it. This should be nice and fast. You can also have a color array to modulate for lighting effects.

One issue is maybe a single atlas can't fit all possible tile-textures. The iPhone caps textures at 1024x1024. If you use 128x128 pixels per tile, you can fit 64 tiles per atlas. That's pretty good already, and you can always reduce tile-texture resolution if this isn't enough. I should make a utility, in matlab or something, to put these texture atlases together automatically.

Another concern for drawing is dealing with overlap. Currently, I sort everything from back-to-front (easy for a grid) and just draw them. I can probably use the depth buffer and avoid sorting myself - that can go either way I think. But what if you have varying depth within a tile? For example, if a wizard is on a tile of tall grass, should the blades of grass draw over his feet? The current system wouldn't allow for that. That probably will be something I'll just put up with, since giving each tile a depth-texture seems over kill. I could just go full-3D as well, but that opens up all sorts of performance issues. And plus, 2D art is awesome.