Chapter 12 - December 23rd, 2013

More for less

With the split screen horizontal scrolling now operational, it was time for some enhancements and speed optimisations.

In summary, my scrolling routine decoded the terrain data and generated a vertical strip containing the ceiling and ground detail which scrolls in from the right side of the screen. This data is formatted as Key Strips which are spaced 8 strips apart. The decoding routine would calculate the strips in between by sloping the intermediate strips to interconnect between these key strips. These same key strips also contain data for any objects which may be needed at this point but this has not yet been implemented at this stage. This system allows for a compressed data format which can decode 200 screens worth of data stored in an 8k MMU block of memory.

The terrain at this point was simply solid in color and looked a bit featureless so I set about to add some detail without taking up too many CPU cycles. Since the terrain decoding routine worked in vertical strips of detail, I confined myself to this and came up with the idea of stripes. It was an easy matter to simply allocate 2 colors to the terrain and draw each vertical strip by alternating between these 2 colors. That also seemed to tie in with the scrolling vertical stripes theme I designed in the Title Page heading.

To round off the detail enhancements, I also added the ability to set 2 bytes of the edges of the terrain to be a fixed color. The result is that there is a solid outline to the terrain.

This was a simple way to add some detail with very little additional overhead. A screenshot of the final result is shown below.

 

Speed optimisation

And while we're in enhancement mode, why not have a crack at getting some extra speed!

If you recall from the last chapter, I said that my routine drew a full vertical solid color then cut out the gap in between at the appropriate point to created the ceiling and ground spacing. This vertical blast of color does 2 columns simultaneously and uses a technique of loop unrolling. My routine which did the cut-out portion afterwards was still a conventional loop since the size of this vertical strip would vary.

Now that too has been loop unrolled and the speed boost has been impressive. The end result is that although I have added to the code to include the enhancements of striping and edging, the actual code speed is less than what it was before. A case of more-for-less!

The screenshot representations below show the red border regions of my speed measuring routine described in an earlier chapter. Both examples are again worst-case-scenarios where the cutout area is at it's maximum (no terrain at all).

 

Before the enhancements

After the enhancements

 

What is "Loop Unrolling"?

Loop unrolling is exactly as the term says... unrolling the loop! The easiest way to explain it is with a simple example.

The following assembly language code simply writes the value stored in the A register into 10 sequencial memory locations pointed to by the X register. There are faster ways than STA ,X+ to do this but for the sake of keeping the described examples simple, this will suffice.

 

 EXAMPLE 1 - Conventional loop

LABEL

ASSEMBLY CODE

COMMENTS

BYTES USED

CYCLES USED

 

LOOP

LDB      #10

STA       ,X+

DECB

BNE       LOOP

Load B with 10. B is to be used as a counter variable.

Store A into position pointed by X and auto increment X .

Decrement B counter variable.

Loop back to label "LOOP" until B decrements to zero.

2

2

1

2

2

6

2

2

This code only occupies 7 bytes and takes 102 CPU cycles (100 + 2 for the LDB #10) to complete 10 iterations of the loop.

 

 EXAMPLE 2 - Unrolled loop

LABEL

ASSEMBLY CODE

COMMENTS

BYTES USED

CYCLES USED

 

STA       ,X+

STA       ,X+

STA       ,X+

STA       ,X+

STA       ,X+

STA       ,X+

STA       ,X+

STA       ,X+

STA       ,X+

STA       ,X+

Store A into position pointed by X and auto increment X.

Store A into position pointed by X and auto increment X.

Store A into position pointed by X and auto increment X.

Store A into position pointed by X and auto increment X.

Store A into position pointed by X and auto increment X.

Store A into position pointed by X and auto increment X.

Store A into position pointed by X and auto increment X.

Store A into position pointed by X and auto increment X.

Store A into position pointed by X and auto increment X.

Store A into position pointed by X and auto increment X.

2

2

2

2

2

2

2

2

2

2

6

6

6

6

6

6

6

6

6

6

This code occupies 20 bytes and takes 60 CPU cycles to complete.

 

It is very clear that Example 1 is very efficient in terms of memory space compared to Example 2 (7 bytes versus 20 bytes) but Example 2 is more efficient in speed compared to Example 1 (60 cycles versus 102 cycles).

Example 2 negates the need for the actual loop structure itself but at the cost of memory space. In these examples, the repeating operation is only 10. It becomes a problem with memory usage if you have a need to execute 100 or more repeating operations and for this, I use a combination of both these structures.

In these situations, I break the repeating operations into a smaller chunk. If I need to repeat 100 times, I would create an unrolled loop version that is 25 repeated operations and encompass this in a conventional loop of 4. This shrinks my unrolled loop by a quarter yet still offers a speed increase over a conventional loop.

Loops which may vary in size can be accomodated with the unrolled loops for the greater part and then a conventional loop at the end to take up the remaining offset count.

Coming up!

That concludes the scrolling terrain part. Next on the list is to add the terrain objects such as enemy guns and fuel tanks. The data for these are already mapped into the terrain encoding and merely needs to add the decoding within the terrain generating code.

  

                                 

Copyright 2013 by Nickolas Marentes