SPO600 - Lab 2 Part 2

This blog post is for my SPO600 class I am participating at Seneca College, this is related to Lab 2 detailed here.

The first part of this lab involved analyzing a piece of assembly code and calculating the performance to determine how long it would take to execute completely. That was accomplished in my first post that can be found here.

The next step in the lab is modifying the code in a couple ways, first being changing the program in such a way that we reduce the runtime of filling the screen in with one colour. The next is simply changing the colour we fill the display with to a solid light blue instead of yellow. Next we are requested to change the program to colour each of the four pages (quarters of the display), a different colour.

I'll start with the simple ones first, to give the display some new colours.

A Quick Change of Scenery

This part will be pretty simple, but let's break it down regardless. Let's have a look at the code once again.

lda #$00	; set a pointer at $40 to point to $0200
	sta $40
	lda #$02
	sta $41

	lda #$07	; colour number

	ldy #$00	; set index to 0

loop:	sta ($40),y	; set pixel at the address (pointer)+Y

	iny		; increment index
	bne loop	; continue until done the page

	inc $41		; increment the page
	ldx $41		; get the current page number
	cpx #$06	; compare with 6
	bne loop	; continue until done all pages 

What do we change to modify the colour? In my first post I went into a bit of detail on what the program is doing, but I never explored how the pixels are actually painted.

It begins with loading memory locations $40 and $41 with the STA instruction, while the accumulator holds values 00 and 02 respectively. What this gives us in a 6502 system, because it is little endian, is a pointer to the first pixel in the first page of the display, memory location $0200.

This sets us up for a few important steps, first of which being painting our first pixel. This is done simply by loading the accumulator with a value for the colour we want, in this case it was 07, the original dull yellow. Next with the accumulator loaded we call the 'STA $40,y' instruction to store the value into the memory location $0200, pointed to by $40. The low byte is also offset by the Y register in this instruction call, but at the moment it is 0 so this will still store our yellow colour into $0200.


The next step is incrementing the Y register and looping through each value in the 02 page (0200 through 02FF). To do this, each loop 'INY' is called, this will increment the Y register by one, and using our 'STA ($40),y' instruction call we are painting the next pixel in line. Once all 256 pixels in the 02 page have been painted the loop exits.


The final step is what is most important for the pointer we set up. The pointer we have set up has the high byte stored in memory location $41. This byte determines the page we are on. Since we already painted the 02 page and need to paint all of 02 through 05, we just need to increment the value stored in $41. This is done with 'INC $41', setting us to the 03 page. Once this is complete, we branch back to the first portion of our loop and paint each pixel in the 03 page. This is continued until the entire display is painted. Whether the program is complete or not is determined by a combination of storing the current page number in the X register and comparing it to the number 06, indicating we have moved past 05 and are done!


After all that, it's pretty simple just changing the colour! The requested colour is a light blue, and that is done by loading the accumulator with value 0e just before we begin painting. The next modification is a fun one, making the program paint a new colour each page. My favourite way was shown off in class and that is loading the accumulator with the value of the page we're on each time it's incremented. If we begin with a colour value of 02 and do this each time the page changes, we'll paint the 02 page red, the 03 page cyan, the 04 page purple, and the 05 page green.

This method is by far my favourite, it's the colour the pages are meant to be!

Make It Faster

The final step in the lab is investigating how to make the program faster. That may be difficult, to me it seems pretty low to the ground. My first thought was finding something to hoist out of the loops. No dice there, everything in there is important and needs to happen each time the loop executes. The next place I looked was the specific addressing modes of our instructions. When researching what this program was doing I noticed that the 'STA' instruction using the indirect addressing took an extra cycle over the absolute addressing mode.

Does using the absolute addressing save much time? Not in the grand scheme of things, no, but it DOES save 1 cycle per pixel painted and we are painting 256 pixels per page. That's a lot of cycles. Now what does the program look like if we need to use absolute addressing to paint the pixels? It looks really bad, in my opinion. The original solution is relatively elegant considering this is assembly and I doubt many people would use that word looking at assembly code. The use of the pointer pointing to the first pixel and using the increment to move through the pages is wonderful.

The faster solution to this looks like this:

lda #$0e	; colour number
	ldy #$00	; set index to 0

loop:	sta $0200,y	; set pixel at the address (pointer)+Y

	iny		; increment index
	bne loop	; continue until done the page

loop2:	sta $0300,y	; set pixel at the address (pointer)+Y

	iny		; increment index
	bne loop2	; continue until done the page

loop3:	sta $0400,y	; set pixel at the address (pointer)+Y

	iny		; increment index
	bne loop3	; continue until done the page
	
loop4:	sta $0500,y	; set pixel at the address (pointer)+Y
	
	iny		; increment index
	bne loop4	; continue until done the page

One loop for each page. This accomplishes three things, one more important than the others when factoring in execution time.

First, it removes the instructions before the loops that set up the pointer to our first pixel. This is inconsequential, in the grand scheme of things it removes 10 cycles, not a big deal.

The next thing this changes is removing the 2nd portion of our loop, the one responsible for incrementing the high byte in $41. This is also negligible, saving 51 cycles.

The final improvement this change makes is relying only on absolute addressed 'STA' calls each time we paint our pixels. Each loop is responsible for a different page, and we still paint each pixel by offsetting with the incremented Y register. This change alone saves us a total of 1024 cycles, this is very nearly one tenth of the original cycle count. The total different factoring in each individual change is 1085 cycles.

It may not be pretty, but it is faster! The only problem is that this code is slightly bigger. Though when I say slightly, I mean it is 28 bytes vs. the original 27 bytes.

 A pretty good trade, for some ugly speedy code!

Comments