Copyright © 2000-2001 Christopher S. Charabaruk and Matthew R. Knight. All rights reserved. All articles, tutorials, etc. copyright © by the original authors unless otherwise noted. QB Cult Magazine is the exclusive property and copyright of Christopher Steffan Charabaruk and Matthew R. Knight.
Welcome to the third issue of our second year. In this issue you will read about sorting and searching, texture mapping, rpg delevoping, bitmap loading, and some other small stuff. I hope you'll like it.
And please help QBCM by sending articles, news or whatever, cause it hardly gets any content at all.
Mikael Andersson (Sane), editor
Note: Regarding BASIC Techniques and Utilities, it was originally published by Ziff-Davis Press, producers of PC Magazine and other computer related publications. After ZD stopped printing it, they released the rights to Ethan Winer, the author. After communicating with Ethan by e-mail, he allowed me to reproduce his book, chapter by chapter, in QBCM. You can find a full text version of BASIC Techniques and Utilities at Ethan's website <www.ethanwiner.com>, or wait until the serialization is complete, when I will have an HTML version ready.
Hi,
"QB Cult Magazine Vol.2 Iss. 2 - May 2001" mentions our company (Ocelot) and its database product. I'd just like to correct the address + contact information:
Ocelot Computer Services Inc., 8303 142 Ave, Edmonton AB Canada 780-472-6838
Incidentally you can download a free copy of our SQL DBMS from: http://ourworld.compuserve.com/homepages/OCELOTSQL/download.htm Thanks.
Peter Gulutzan
I thought this was close enough to a letter to the editor, and this was the easiest way to give it to you readers :)
Thanks go to Hard Rock for submitting some of the news.
To place an ad, please e-mail <qbcm@tekscode.com>, subject QB Ads. Include your name (real or fake), e-mail address, and message. You may include HTML formatting (but risk the chance of the formatting being thrown out).
By Ethan Winer <ethan@ethanwiner.com>
Two fundamental operations required of many applications are searching and sorting the data they operate on. Many different types of data are commonly sorted, such as customer names, payment due dates, or even a list of file names displayed in a file selection menu. If you are writing a programmer's cross reference utility, you may need to sort a list of variable names without regard to capitalization. In some cases, you may want to sort several pieces of related information based on the contents of only one of them. One example of that is a list of names and addresses sorted in ascending zip code order.
Searching is equally important; for example, to locate a customer name in an array or disk file. In some cases you may wish to search for a complete match, while in others a partial match is needed. If you are searching a list of names for, say, Leonard, you probably would want to ignore Leonardo. But when searching a list of zip codes you may need to locate all that begin with the digits 068. There are many different ways sorting and searching can be accomplished, and the subject is by no means a simple one.
Most programmers are familiar with the Bubble Sort, because it is the simplest to understand. Each adjacent pair of items is compared, and then exchanged if they are out of order. This process is repeated over and over, until the entire list has been examined as many times as there are items. Unfortunately, these repeated comparisons make the Bubble Sort an extremely poor performer. Similarly, code to perform a linear search that simply examines each item in succession for a match is easy to grasp, but it will be painfully slow when there are many items.
In this chapter you will learn how sophisticated algorithms that handle these important programming chores operate. You will also learn how to sort data on more than one key. Often, it is not sufficient to merely sort a list of customers by their last name. For example, you may be expected to sort first by last name, then by first name, and finally by balance due. That is, all of the last names would first be sorted. Then within all of the Smiths you would sort again by first name, and for all of the John Smiths sort that subgroup based on how much money is owed.
For completeness I will start each section by introducing sorting and searching methods that are easy to understand, and then progress to the more complex algorithms that are much more effective. Specifically, I will show the Quick Sort and Binary Search algorithms. When there are many thousands of data items, a good algorithm can make the difference between a sort routine that takes ten minutes to complete, and one that needs only a few seconds.
Finally, I will discuss both BASIC and assembly language sort routines. As important as the right algorithm is for good performance, an assembly language implementation will be even faster. Chapter 12 describes how assembly language routines are written and how they work, and in this chapter I will merely show how to use the routines included with this book.
Although there are many different ways to sort an array, the simplest sorting algorithm is the Bubble Sort. The name Bubble is used because a FOR/NEXT loop repeatedly examines each adjacent pair of elements in the array, and those that have higher values rise to the top like bubbles in a bathtub. The most common type of sort is ascending, which means that "A" comes before "B", which comes before "C", and so forth. Figure 8-1 shows how the name Zorba ascends to the top of a five-item list of first names.
Initial array contents: Element 4 Kathy Element 3 Barbara Element 2 Cathy Element 1 Zorba < After 1 pass: Element 4 Kathy Element 3 Barbara Element 2 Zorba < Element 1 Cathy After 2 passes: Element 4 Kathy Element 3 Zorba < Element 2 Barbara Element 1 Cathy After 3 passes: Element 4 Zorba < Element 3 Kathy Element 2 Barbara Element 1 Cathy
The Bubble Sort routine that follows uses a FOR/NEXT loop to repeatedly examine an array and exchange elements as necessary, until all of the items are in the correct order.
DEFINT A-Z DECLARE SUB BubbleSort (Array$()) CONST NumItems% = 20 CONST False% = 0 CONST True% = -1 DIM Array$(1 TO NumItems%) FOR X = 1 TO NumItems% READ Array$(X) NEXT CALL BubbleSort(Array$()) CLS FOR X = 1 TO NumItems% PRINT Array$(X) NEXT DATA Zorba, Cathy, Barbara, Kathy, Josephine DATA Joseph, Joe, Peter, Arnold, Glen DATA Ralph, Elli, Lucky, Rocky, Louis DATA Paula, Paul, Mary Lou, Marilyn, Keith END SUB BubbleSort (Array$()) STATIC DO OutOfOrder = False% 'assume it's sorted FOR X = 1 TO UBOUND(Array$) - 1 IF Array$(X) > Array$(X + 1) THEN SWAP Array$(X), Array$(X + 1) 'if we had to swap OutOfOrder = True% 'we may not be done END IF NEXT LOOP WHILE OutOfOrder END SUB
This routine is simple enough to be self-explanatory, and only a few things warrant discussing. One is the OutOfOrder flag variable. When the array is nearly sorted to begin with, fewer passes through the loop are needed. The OutOfOrder variable determines when no more passes are necessary. It is cleared at the start of each loop, and set each time two elements are exchanged. If, after examining all of the elements in one pass no exchanges were required, then the sorting is done and there's no need for the DO loop to continue.
The other item worth mentioning is that the FOR/NEXT loop is set to consider one element less than the array actually holds. This is necessary because each element is compared to the one above it. If the last element were included in the loop, then BASIC would issue a "Subscript out of range" error on the statement that examines Array$(X + 1).
There are a number of features you can add to this Bubble Sort routine. For example, you could sort without regard to capitalization. In that case "adams" would come before "BAKER", even though the lowercase letter "a" has a higher ASCII value than the uppercase letter "B". To add that capability simply use BASIC's UCASE$ (or LCASE$) function as part of the comparisons:
IF UCASE$(Array$(X)) > UCASE$(Array$(X + 1)) THEN
And to sort based on the eight-character portion that starts six bytes into each string you would use this:
IF MID$(Array$(X), 5, 8) > MID$(Array$(X + 1), 5, 8) THEN
Although the comparisons in this example are based on just a portion of each string, the SWAP statement must exchange the entire elements. This opens up many possibilities as you will see later in this chapter.
If there is a chance that the strings may contain trailing blanks that should be ignored, you can use RTRIM$ on each pair of elements:
IF RTRIM$(Array$(X)) > RTRIM$(Array$(X + 1)) THEN
Of course, you can easily combine these enhancements to consider only the characters in the middle after they have been converted to upper or lower case.
Sorting in reverse (descending) order is equally easy; you'd simply replace the greater-than symbol (>) with a less-than symbol (<).
Finally, you can modify the routine to work with any type of data by changing the array type identifier. That is, for every occurrence of Array$ you will change that to Array% or Array# or whatever is appropriate. If you are sorting a numeric array, then different modifications may be in order. For example, to sort ignoring whether the numbers are positive or negative you would use BASIC's ABS (absolute value) function:
IF ABS(Array!(X)) > ABS(Array!(X + 1)) THEN
It is important to point out that all of the simple modifications described here can also be applied to the more sophisticated sort routines we will look at later in this chapter.
Besides the traditional sorting methods--whether a Bubble Sort or Quick Sort or any other type of sort--there is another category of sort routine you should be familiar with. Where a conventional sort exchanges elements in an array until they are in order, an Index Sort instead exchanges elements in a parallel numeric array of *pointers*. The original data is left intact, so it may still be accessed in its natural order. However, the array can also be accessed in sorted order by using the element numbers contained in the index array.
As with a conventional sort, the comparisons in an indexed sort routine examine each element in the primary array, but based on the element numbers in that index array. If it is determined that the data is out of order, the routine exchanges the elements in the index array instead of the primary array. A modification to the Bubble Sort routine to sort using an index is shown below.
DEFINT A-Z DECLARE SUB BubbleISort (Array$(), Index()) CONST NumItems% = 20 CONST False% = 0 CONST True% = -1 DIM Array$(1 TO NumItems%) 'this holds the string data DIM Ndx(1 TO NumItems%) 'this holds the index FOR X = 1 TO NumItems% READ Array$(X) 'read the string data Ndx(X) = X 'initialize the index array NEXT CALL BubbleISort(Array$(), Ndx()) CLS FOR X = 1 TO NumItems% PRINT Array$(Ndx(X)) 'print based on the index NEXT DATA Zorba, Cathy, Barbara, Kathy, Josephine DATA Joseph, Joe, Peter, Arnold, Glen DATA Ralph, Elli, Lucky, Rocky, Louis DATA Paula, Paul, Mary lou, Marilyn, Keith SUB BubbleISort (Array$(), Index()) STATIC DO OutOfOrder = False% 'assume it's sorted FOR X = 1 TO UBOUND(Array$) - 1 IF Array$(Index(X)) > Array$(Index(X + 1)) THEN SWAP Index(X), Index(X + 1) 'if we had to swap OutOfOrder% = True% 'we're not done yet END IF NEXT LOOP WHILE OutOfOrder% END SUB
In this indexed sort, all references to the data are through the index array. And when a swap is necessary, it is the index array elements that are exchanged. Note that an indexed sort requires that the index array be initialized to increasing values--even if the sort routine is modified to be descending instead of ascending. Therefore, when BubbleISort is called Ndx(1) must hold the value 1, Ndx(2) is set to 2, and so forth.
In this example the index array is initialized by the caller. However, it would be just as easy to put that code into the subprogram itself. Since you can't pass an array that hasn't yet been dimensioned, it makes the most sense to do both steps outside of the subprogram. Either way, the index array must be assigned to these initial values.
As I mentioned earlier, one feature of an indexed sort is that it lets you access the data in both its original and sorted order. But there are other advantages, and a disadvantage as well. The disadvantage is that each comparison takes slightly longer, because of the additional overhead required to first look up the element number in the index array, to determine which elements in the primary array will be compared. In some cases, though, that can be more than offset by requiring less time to exchange elements.
If you are sorting an array of 230-byte TYPE variables, the time needed for SWAP to exchange the elements can become considerable. Every byte in both elements must be read and written, so the time needed increases linearly as the array elements become longer. Contrast that with the fixed two bytes in the integer index array that are swapped.
Another advantage of an indexed sort is that it lends itself to sorting more data than can fit in memory. As you will see later in the section that shows how to sort files, it is far easier to manipulate an integer index than an entire file. Further, sorting the file data using multiple passes requires twice as much disk space as the file already occupies.
Before I show the Quick Sort algorithm that will be used as a basis for the remaining sort examples in this chapter, you should also be aware of a few simple tricks that can help you maintain and sort your data. One was described in Chapter 6, using a pair of functions that pack and unpack dates such that the year is stored before the month, which in turn is before the day. Thus, date strings are reduced to only three characters each, and they can be sorted directly.
Another useful speed-up trick is to store string data as integers or long integers. If you had a system of four-digit account numbers you could use an integer instead of a string. Besides saving half the memory and disk space, the integer comparisons in a sort routine will be many times faster than a comparison on string equivalents. Zip codes are also suited to this, and could be stored in a long integer. Even though the space savings is only one byte, the time needed to compare the values for sorting will be greatly reduced.
This brings up another important point. As you learned in Chapter 2, all conventional (not fixed-length) strings require more memory than might be immediately apparent. Besides the amount of memory needed to hold the data itself, four additional bytes are used for a string descriptor, and two more beyond those for a back pointer. Therefore, a zip code stored as a string will actually require eleven bytes rather than the five you might expect. With this in mind, you may be tempted to think that using a fixed- length string to hold the zip codes will solve the problem. Since fixed- length strings do not use either descriptors or back pointers, they do not need the memory they occupy. And that leads to yet another issue.
Whenever a fixed-length string or the string portion of a TYPE variable is compared, it must first be converted to a regular descriptored string. BASIC has only one string comparison routine, and it expects the addresses for two conventional string descriptors. Every time a fixed-length string is used as an argument for comparison, BASIC must create a temporary copy, call its comparison routine, and then delete the copy. This copying adds code and wastes an enormous amount of time; in many cases the copying will take longer than the comparison itself. Therefore, using integers and long integers for numeric data where possible will provide more improvement than just the savings in memory use.
In some cases, however, you must use fixed-length string or TYPE arrays. In particular, when sorting information from a random access disk file it is most sensible to load the records into a TYPE array. And as you learned in Chapter 2, the string components of a TYPE variable or array element are handled by BASIC as a fixed-length string. So how can you effectively sort fixed-length string arrays without incurring the penalty BASIC's overhead imposes? With assembly language subroutines, of course!
Rather than ask BASIC to pass the data using its normal methods, assembly language routines can be invoked passing the data segments and addresses directly. When you use SEG, or a combination of VARSEG and VARPTR with fixed-length and TYPE variables, BASIC knows that you want the segmented address of the variable or array element. Thus, you are tricking BASIC into not making a copy as it usually would when passing such data. An assembly language subroutine or function can be designed to accept data addresses in any number of ways. As you will see later when we discuss sorting on multiple keys, extra trickery is needed to do the same thing in a BASIC procedure.
The three short assembly language functions that follow compare two portions of memory, and then return a result that can be tested by your program.
;COMPARE.ASM - compares two ranges of memory .Model Medium, Basic .Code Compare Proc Uses DS ES DI SI, SegAdr1:DWord, _ SegAdr2:DWord, NumBytes:Word Cld ;compare in the forward direction Mov SI,NumBytes ;get the address for NumBytes% Mov CX,[SI] ;put it into CX for comparing below Les DI,SegAdr1 ;load ES:DI with the first ; segmented address Lds SI,SegAdr2 ;load DS:SI with the second ; segmented address Repe Cmpsb ;do the compare Mov AX,0 ;assume the bytes didn't match Jne Exit ;we were right, skip over Dec AX ;wrong, decrement AX down to -1 Exit: Ret ;return to BASIC Compare Endp End
;COMPARE2.ASM - compares memory case-insensitive .Model Medium, Basic .Code Compare2 Proc Uses DS ES DI SI, SegAdr1:DWord, _ SegAdr2:DWord, NumBytes:Word Cld ;compare in the forward direction Mov BX,-1 ;assume the ranges are the same Mov SI,NumBytes ;get the address for NumBytes% Mov CX,[SI] ;put it into CX for comparing below Jcxz Exit ;if zero bytes were given, they're ; the same Les DI,SegAdr1 ;load ES:DI with the first address Lds SI,SegAdr2 ;load DS:SI with the second address Do: Lodsb ;load the current character from ; DS:SI into AL Call Upper ;capitalize as necessary Mov AH,AL ;copy the character to AH Mov AL,ES:[DI] ;load the other character into AL Inc DI ;point at the next one for later Call Upper ;capitalize as necessary Cmp AL,AH ;now, are they the same? Jne False ;no, exit now and show that Loop Do ;yes, continue Jmp Short Exit ;if we get this far, the bytes are ; all the same False: Inc BX ;increment BX to return zero Exit: Mov AX,BX ;assign the function output Ret ;return to BASIC Upper: Cmp AL,"a" ;is the character below an "a"? Jb Done ;yes, so we can skip it Cmp AL,"z" ;is the character above a "z"? Ja Done ;yes, so we can skip that too Sub AL,32 ;convert to upper case Done: Retn ;do a near return to the caller Compare2 Endp End
;COMPARE3.ASM - case-insensitive, greater/less than .Model Medium, Basic .Code Compare3 Proc Uses DS ES DI SI, SegAdr1:DWord, _ SegAdr2:DWord, NumBytes:Word Cld ;compare in the forward direction Xor BX,BX ;assume the ranges are the same Mov SI,NumBytes ;get the address for NumBytes% Mov CX,[SI] ;put it into CX for comparing below Jcxz Exit ;if zero bytes were given, they're ; the same Les DI,SegAdr1 ;load ES:DI with the first address Lds SI,SegAdr2 ;load DS:SI with the second address Do: Lodsb ;load the current character from ; DS:SI into AL Call Upper ;capitalize as necessary, remove for ; case-sensitive Mov AH,AL ;copy the character to AH Mov AL,ES:[DI] ;load the other character into AL Inc DI ;point at the next character for later Call Upper ;capitalize as necessary, remove for ; case-sensitive Cmp AL,AH ;now, are they the same? Loope Do ;yes, continue Je Exit ;we exhausted the data and they're ; the same Mov BL,1 ;assume block 1 was "greater" Ja Exit ;we assumed correctly Dec BX ;wrong, bump BX down to -1 Dec BX Exit: Mov AX,BX ;assign the function output Ret ;return to BASIC Upper: Cmp AL,"a" ;is the character below an "a"? Jb Done ;yes, so we can skip it Cmp AL,"z" ;is the character above a "z"? Ja Done ;yes, so we can skip that too Sub AL,32 ;convert to upper case Done: Retn ;do a near return to the caller Compare3 Endp End
The first Compare routine above simply checks if all of the bytes are identical, and returns -1 (True) if they are, or 0 (False) if they are not. By returning -1 or 0 you can use either
IF Compare%(Type1, Type2, NumBytes%) THEN
or
IF NOT Compare%(Type1, Type2, NumBytes%) THEN
depending on which logic is clearer for your program. Compare2 is similar to Compare, except it ignores capitalization. That is, "SMITH" and Smith" are considered equal. The Compare3 function also compares memory and ignores capitalization, but it returns either -1, 0, or 1 to indicate if the first data range is less than, equal to, or greater than the second.
The correct declaration and usage for each of these routines is shown below. Note that Compare and Compare2 are declared and used in the same fashion.
Compare and Compare2:
DECLARE FUNCTION Compare%(SEG Type1 AS ANY, SEG Type2 AS ANY, _ NumBytes%) Same = Compare%(Type1, Type2, NumBytes%)
or
DECLARE FUNCTION Compare%(BYVAL Seg1%, BYVAL Adr1%, BYVAL Seg2%, _ BYVAL Adr2%, NumBytes%) Same = Compare%(Seg1%, Adr1%, Seg2%, Adr2%, NumBytes%)
Here, Same receives -1 if the two TYPE variables or ranges of memory are the same, or 0 if they are not. NumBytes% tells how many bytes to compare.
Compare3:
DECLARE FUNCTION Compare3%(SEG Type1 AS ANY, SEG Type2 AS ANY, _ NumBytes%) Result = Compare3%(Type1, Type2, NumBytes%)
or
DECLARE FUNCTION Compare3%(BYVAL Seg1%, BYVAL Adr1%, BYVAL Seg2%, _ BYVAL Adr2%, NumBytes%) Result = Compare3%(Seg1%, Adr1%, Seg2%, Adr2%, NumBytes%)
Result receives 0 if the two type variables or ranges of memory are the same, -1 if the first is less when compared as strings, or 1 if the first is greater. NumBytes% tells how many bytes are to be to compared. In the context of a sort routine you could invoke Compare3 like this:
IF Compare3%(TypeEl(X), TypeEl(X + 1), NumBytes%) = 1 THEN SWAP TypeEl(X), TypeEl(X + 1) END IF
As you can see, these routines may be declared in either of two ways. When used with TYPE arrays the first is more appropriate and results in slightly less setup code being generated by the compiler. When comparing fixed-length strings or arbitrary blocks of memory (for example, when one of the ranges is on the display screen) you should use the second method. Since SEG does not work correctly with fixed-length strings, if you want to use that more efficient version you must create a dummy TYPE comprised solely of a single string portion:
TYPE FixedLength Something AS STRING * 35 END TYPE
Then simply use DIM to create a single variable or an array based on this or a similar TYPE, depending on what your program needs. The requirement to create a dummy TYPE was discussed in Chapter 2, and I won't belabor the reasons again here. These comparison routines will be used extensively in the sort routines presented later in this chapter; however, their value in other, non-sorting situations should also be apparent.
Although these routines are written in assembly language, they are fairly simple to follow. It is important to understand that you do not need to know anything about assembly language to use them. All of the files you need to add these and all of the other routines in this book are contained on the accompanying diskette [here, in the same ZIP file as this text]. Chapter 12 discusses assembly language in great detail, and you can refer there for further explanation of the instructions used.
If you plan to run the programs that follow in the QuickBASIC editor, you must load the BASIC.QLB Quick Library as follows:
qb program /l basic
Later when you compile these or other programs you must link with the parallel BASIC.LIB file:
bc program [/o]; link program , , nul , basic;
If you are using BASIC PDS start QBX using the BASIC7.QLB file, and then link with BASIC7.LIB to produce a stand-alone .EXE program. [VB/DOS users will also use the BASIC7 version.
It should be obvious to you by now that a routine written in assembly language will always be faster than an equivalent written in BASIC. However, simply translating a procedure to assembly language is not always the best solution. Far more important than which language you use is selecting an appropriate algorithm. The best sorting method I know is the Quick Sort, and a well-written version of Quick Sort using BASIC will be many times faster than an assembly language implementation of the Bubble Sort.
The main problem with the Bubble Sort is that the number of comparisons required grows exponentially as the number of elements increases. Since each pass through the array exchanges only a few elements, many passes are required before the entire array is sorted. The Quick Sort was developed by C.A.R. (Tony) Hoare, and is widely recognized as the fastest algorithm available. In some special cases, such as when the data is already sorted or nearly sorted, the Quick Sort may be slightly slower than other methods. But in most situations, a Quick Sort is many times faster than any other sorting algorithm.
As with the Bubble Sort, there are many different variations on how a Quick Sort may be coded. (You may have noticed that the Bubble Sort shown in Chapter 7 used a nested FOR/NEXT loop, while the one shown here uses a FOR/NEXT loop within a DO/WHILE loop.) A Quick Sort divides the array into sections--sometimes called partitions--and then sorts each section individually. Many implementations therefore use recursion to invoke the subprogram from within itself, as each new section is about to be sorted. However, recursive procedures in any language are notoriously slow, and also consume stack memory at an alarming rate.
The Quick Sort version presented here avoids recursion, and instead uses a local array as a form of stack. This array stores the upper and lower bounds showing which section of the array is currently being considered. Another refinement I have added is to avoid making a copy of elements in the array. As a Quick Sort progresses, it examines one element selected arbitrarily from the middle of the array, and compares it to the elements that lie above and below it. To avoid assigning a temporary copy this version simply keeps track of the selected element number.
When sorting numeric data, maintaining a copy of the element is reasonable. But when sorting strings--especially strings whose length is not known ahead of time--the time and memory required to keep a copy can become problematic. For clarity, the generic Quick Sort shown below uses the copy method. Although this version is meant for sorting a single precision array, it can easily be adapted to sort any type of data by simply changing all instances of the "!" type declaration character.
'******** QSORT.BAS, Quick Sort algorithm demonstration 'Copyright (c) 1991 Ethan Winer DEFINT A-Z DECLARE SUB QSort (Array!(), StartEl, NumEls) RANDOMIZE TIMER 'generate a new series each run DIM Array!(1 TO 21) 'create an array FOR X = 1 TO 21 'fill with random numbers Array!(X) = RND(1) * 500 'between 0 and 500 NEXT FirstEl = 6 'sort starting here NumEls = 10 'sort this many elements CLS PRINT "Before Sorting:"; TAB(31); "After sorting:" PRINT "==============="; TAB(31); "==============" FOR X = 1 TO 21 'show them before sorting IF X >= FirstEl AND X <= FirstEl + NumEls - 1 THEN PRINT "==>"; END IF PRINT TAB(5); USING "###.##"; Array!(X) NEXT CALL QSort(Array!(), FirstEl, NumEls) LOCATE 3 FOR X = 1 TO 21 'print them after sorting LOCATE , 30 IF X >= FirstEl AND X <= FirstEl + NumEls - 1 THEN PRINT "==>"; 'point to sorted items END IF LOCATE , 35 PRINT USING "###.##"; Array!(X) NEXT SUB QSort (Array!(), StartEl, NumEls) STATIC REDIM QStack(NumEls \ 5 + 10) 'create a stack array First = StartEl 'initialize work variables Last = StartEl + NumEls - 1 DO DO Temp! = Array!((Last + First) \ 2) 'seek midpoint I = First J = Last DO 'reverse both < and > below to sort descending WHILE Array!(I) < Temp! I = I + 1 WEND WHILE Array!(J) > Temp! J = J - 1 WEND IF I > J THEN EXIT DO IF I < J THEN SWAP Array!(I), Array!(J) I = I + 1 J = J - 1 LOOP WHILE I <= J IF I < Last THEN QStack(StackPtr) = I 'Push I QStack(StackPtr + 1) = Last 'Push Last StackPtr = StackPtr + 2 END IF Last = J LOOP WHILE First < Last IF StackPtr = 0 THEN EXIT DO 'Done StackPtr = StackPtr - 2 First = QStack(StackPtr) 'Pop First Last = QStack(StackPtr + 1) 'Pop Last LOOP ERASE QStack 'delete the stack array END SUB
Notice that I have designed this routine to allow sorting only a portion of the array. To sort the entire array you'd simply omit the StartEl and NumEls parameters, and assign First and Last from the LBOUND and UBOUND element numbers. That is, you will change these:
First = StartEl
and
Last = StartEl + NumEls - 1
to these:
First = LBOUND(Array!)
and
Last = UBOUND(Array!)
As I mentioned earlier, the QStack array serves as a table of element numbers that reflect which range of elements is currently being considered. You will need to dimension this array to one element for every five elements in the primary array being sorted, plus a few extra for good measure. In this program I added ten elements, because one stack element for every five main array elements is not enough for very small arrays. For data arrays that have a large amount of duplicated items, you will probably need to increase the size of the stack array.
Note that this ratio is not an absolute--the exact size of the stack that is needed depends on how badly out of order the data is to begin with. Although it is possible that one stack element for every five in the main array is insufficient in a given situation, I have never seen this formula fail. Because the stack is a dynamic integer array that is stored in far memory, it will not impinge on near string memory. If this routine were designed using the normal recursive method, BASIC's stack would be used which is in near memory.
Each of the innermost DO loops searches the array for the first element in each section about the midpoint that belongs in the other section. If the elements are indeed out of order (when I is less than J) the elements are exchanged. This incrementing and comparing continues until I and J cross. At that point, assuming the variable I has not exceeded the upper limits of the current partition, the partition bounds are saved and Last is assigned to the top of the next inner partition level. When the entire partition has been processed, the previous bounds are retrieved, but as a new set of First and Last values. This process continues until no more partition boundaries are on the stack. At that point the entire array is sorted.
On the accompanying disk you will find a program called SEEQSORT.BAS that contains an enhanced version of the QSort demo and subprogram. This program lets you watch the progress of the comparisons and exchanges as they are made, and actually see this complex algorithm operate. Simply load SEEQSORT.BAS into the BASIC editor and run it. A constant named Delay! is defined at the beginning of the program. Increasing its value makes the program run more slowly; decreasing it causes the program to run faster.
As fast as the BASIC QuickSort routine is, we can make it even faster. The listing below shows an assembly language version that is between ten and twenty percent faster, depending on which compiler you are using and if the BASIC PDS /fs (far strings) option is in effect.
;SORT.ASM - sorts an entire BASIC string array .Model Medium, Basic .Data S DW 0 F DW 0 L DW 0 I DW 0 J DW 0 MidPoint DW 0 .Code Extrn B$SWSD:Proc ;this swaps two strings Extrn B$SCMP:Proc ;this compares two strings Sort Proc Uses SI DI ES, Array:Word, Dir:Word Cld ;all fills and compares are forward Push DS ;set ES = DS for string compares Pop ES Xor CX,CX ;clear CX Mov AX,7376h ;load AL and AH with the opcodes ; Jae and Jbe in preparation for ; code self-modification Mov BX,Dir ;get the sorting direction Cmp [BX],CX ;is it zero (ascending sort)? Je Ascending ;yes, skip ahead Xchg AL,AH ;no exchange the opcodes Ascending: Mov CS:[X1],AH ;install correct comparison opcodes Mov CS:[X2],AL ; based on the sort direction Mov BX,Array ;load the array descriptor address Mov AX,[BX+0Eh] ;save the number of elements Dec AX ;adjust the number to zero-based Jns L0 ;at least 1 element, continue Jmp L4 ;0 or less elements, get out now! L0: Mov BX,Array ;reload array descriptor address Mov BX,[BX] ;Array$(LBOUND) descriptor address Mov S,SP ;StackPtr = 0 (normalized to SP) Mov F,CX ;F = 0 Mov L,AX ;L = Size% ;----- calculate the value of MidPoint L1: Mov DI,L ;MidPoint = (L + F) \ 2 Add DI,F Shr DI,1 Mov MidPoint,DI Mov AX,F ;I = F Mov I,AX Mov AX,L ;J = L Mov J,AX ;----- calculate the offset into the descriptor table for Array$(MidPoint) L1_2: Shl DI,1 ;multiply MidPoint in DI times 4 Shl DI,1 ;now DI holds how far beyond Array$(Start) ; Array$(MidPoint)'s descriptor is Add DI,BX ;add the array base address to produce the final ; address for Array$(MidPoint) ;----- calculate descriptor offset for Array$(I) L2: Mov SI,I ;put I into SI Shl SI,1 ;as above Shl SI,1 ;now SI holds how far beyond Array$(Start) ; Array$(I)'s descriptor is Add SI,BX ;add the base to produce the final descriptor ; address ;IF Array$(I) < Array$(MidPoint) THEN I = I + 1: GOTO L2 Push BX ;save BX because B$SCMP trashes it Push SI Push DI Call B$SCMP ;do the compare Pop BX ;restore BX X1 Label Byte ;modify the code below to "Jbe" if descending sort Jae L2_1 ;Array$(I) isn't less, continue on Inc Word Ptr I ;I = I + 1 Jmp Short L2 ;GOTO L2 ;----- calculate descriptor offset for Array$(J) L2_1: Mov SI,J ;put J into SI Shl SI,1 ;as above Shl SI,1 ;now SI holds how far beyond Array$(Start) ; Array$(J)'s descriptor is Add SI,BX ;add the base to produce the final descriptor ; address ;IF Array$(J) > Array$(MidPoint) THEN J = J - 1: GOTO L2.1 Push BX ;preserve BX Push SI Push DI Call B$SCMP ;do the compare Pop BX ;restore BX X2 Label Byte ;modify the code below to "Jae" if descending sort Jbe L2_2 ;Array$(J) isn't greater, continue on Dec Word Ptr J ;J = J - 1 Jmp Short L2_1 ;GOTO L2.1 L2_2: Mov AX,I ;IF I > J GOTO L3 Cmp AX,J Jg L3 ;J is greater, go directly to L3 Je L2_3 ;they're the same, skip the swap ;Swap Array$(I), Array$(J) Mov SI,I ;put I into SI Mov DI,J ;put J into DI Cmp SI,MidPoint ;IF I = MidPoint THEN MidPoint = J Jne No_Mid1 ;not equal, skip ahead Mov MidPoint,DI ;equal, assign MidPoint = J Jmp Short No_Mid2 ;don't waste time comparing again No_Mid1: Cmp DI,MidPoint ;IF J = MidPoint THEN MidPoint = I Jne No_Mid2 ;not equal, skip ahead Mov MidPoint,SI ;equal, assign MidPoint = I No_Mid2: Mov SI,I ;put I into SI Shl SI,1 ;multiply times four for the Shl SI,1 ; for the descriptors Add SI,BX ;add address for first descriptor Mov DI,J ;do the same for J in DI Shl DI,1 Shl DI,1 Add DI,BX Push BX ;save BX because B$SWSD destroys it Call B$SWSD ;and swap 'em good Pop BX L2_3: Inc Word Ptr I ;I = I + 1 Dec Word Ptr J ;J = J - 1 Mov AX,I ;IF I <= J GOTO L2 Cmp AX,J Jg L3 ;it's greater, skip to L3 Mov DI,MidPoint ;get MidPoint again Jmp L1_2 ;go back to just before L2 L3: Mov AX,I ;IF I < L THEN PUSH I: PUSH L Cmp AX,L Jnl L3_1 ;it's not less, so skip Pushes Push I ;Push I Push L ;Push L L3_1: Mov AX,J ;L = J Mov L,AX Mov AX,F ;IF F < L GOTO L1 Cmp AX,L Jnl L3_2 ;it's not less, jump ahead to L3_2 Jmp L1 ;it's less, go to L1 L3_2: Cmp S,SP ;IF S = 0 GOTO L4 Je L4 Pop L ;Pop L Pop F ;Pop F Jmp L1 ;GOTO L1 L4: Ret ;return to BASIC Sort Endp End
Besides being faster than the BASIC version, the assembly language Sort routine is half the size. This version also supports sorting either forward or backward, but not just a portion of an array. The general syntax is:
CALL Sort(Array$(), Direction)
Where Array$() is any variable-length string array, and Direction is 0 for ascending, or any other value for descending. Note that this routine calls upon BASIC's internal services to perform the actual comparing and swapping; therefore, the exact same code can be used with either QuickBASIC or BASIC PDS. Again, I refer you forward to Chapter 12 for an explanation of the assembly language commands used in SORT.ASM.
In many situations, sorting based on one key is sufficient. For example, if you are sorting a mailing list to take advantage of bulk rates you must sort all of the addresses in order by zip code. When considering complex data such as a TYPE variable, it is easy to sort the array based on one component of each element. The earlier Bubble Sort example showed how MID$ could be used to consider just a portion of each string, even though the entire elements were exchanged. Had that routine been designed to operate on a TYPE array, the comparisons would have examined just one component, but the SWAP statements would exchange entire elements:
IF Array(X).ZipCode > Array(X + 1).ZipCode THEN SWAP Array(X), Array(X + 1) END IF
This way, each customer's last name, first name, street address, and so forth remain connected to the zip codes that are being compared and exchanged.
There are several ways to sort on more than one key, and all are of necessity more complex than simply sorting based on a single key. One example of a multi-key sort first puts all of the last names in order. Then within each group of identical last names the first names are sorted, and within each group of identical last and first names further sorting is performed on yet another key--perhaps Balance Due. As you can see, this requires you to sort based on differing types of data, and also to compare ranges of elements for the subgroups that need further sorting.
The biggest complication with this method is designing a calling syntax that lets you specify all of the information. A table array must be established to hold the number of keys, the type of data in each key (string, double precision, and so forth), and how many bytes into the TYPE element each key portion begins. Worse, you can't simply use the name of a TYPE component in the comparisons inside the sort routine--which would you use: Array(X).LastName, Array(X).FirstName, or Array(X).ZipCode? Therefore, a truly general multi-key sort must be called passing the address where the array begins in memory, and a table of offsets beyond that address where each component being considered is located.
To avoid this added complexity I will instead show a different method that has only a few minor restrictions, but is much easier to design and understand. This method requires you to position each TYPE component into the key order you will sort on. You will also need to store all numbers that will be used for a sort key as ASCII digits. To sort first on last name, then first name, and then on balance due, the TYPE might be structured as follows:
TYPE Customer LastName AS STRING * 15 FirstName AS STRING * 15 BalanceDue AS STRING * 9 Street AS STRING * 32 City AS STRING * 15 State AS STRING * 2 ZipCode AS STRING * 5 AnyNumber AS DOUBLE END TYPE
In most cases the order in which each TYPE member is placed has no consequence. When you refer to TypeVar.LastName, BASIC doesn't care if LastName is defined before or after FirstName in the TYPE structure. Either way it translates your reference to LastName into an address. Having to store numeric data as strings is a limitation, but this is needed only for those TYPE fields that will be used as a sort key.
The key to sorting on multiple items simultaneously is by treating the contiguous fields as a single long field. Since assignments to the string portion of a TYPE variable are handled internally by BASIC's LSET routine, the data in each element will be aligned such that subsequent fields can be treated as an extension of the primary field. Figure 8-2 below shows five TYPE array elements in succession, as they would be viewed by a string comparison routine. This data is defined as a subset of the name and address TYPE shown above, using just the first three fields. Notice that the balance due fields must be right-aligned (using RSET) for the numeric values to be considered correctly.
Type.LastName Type.FirstName Type.BalanceDue ===============---------------========= Munro Jay 8000.00 Smith John 122.03 Johnson Alfred 14537.89 Rasmussen Peter 100.90 Hudson Cindy 21.22 ^ ^ ^ Field 1 Field 2 Field 3 starts here starts here starts here
Thus, the sort routine would be told to start at the first field, and consider the strings to be 15 + 15 + 9 = 39 characters long. This way all three fields are compared at one time, and treated as a single entity. Additional fields can of course follow these, and they may be included in the comparison or not at your option.
The combination demonstration and subroutine below sorts such a TYPE array on any number of keys using this method, and it has a few additional features as well. Besides letting you confine the sorting to just a portion of the array, you may also specify how far into each element the first key is located. As long as the key fields are contiguous, they do not have to begin at the start of each TYPE. Therefore, you could sort just on the first name field, or on any other field or group of fields.
'TYPESORT.BAS - performs a multi-key sort on TYPE arrays 'Copyright (c) 1991 Ethan Winer DEFINT A-Z DECLARE FUNCTION Compare3% (BYVAL Seg1, BYVAL Adr1, BYVAL Seg2, _ BYVAL Adr2, NumBytes) DECLARE SUB SwapMem (BYVAL Seg1, BYVAL Adr1, BYVAL Seg2, BYVAL Adr2, _ BYVAL Length) DECLARE SUB TypeSort (Segment, Address, ElSize, Offset, KeySize, NumEls) CONST NumEls% = 23 'this keeps it all on the screen TYPE MyType LastName AS STRING * 10 FirstName AS STRING * 10 Dollars AS STRING * 6 Cents AS STRING * 2 END TYPE REDIM Array(1 TO NumEls%) AS MyType '---- Disable (REM out) all but one of the following blocks to test Offset = 27 'start sorting with Cents ElSize = LEN(Array(1)) 'the length of each element KeySize = 2 'sort on the Cents only Offset = 21 'start sorting with Dollars ElSize = LEN(Array(1)) 'the length of each element KeySize = 8 'sort Dollars and Cents only Offset = 11 'start sorting with FirstName ElSize = LEN(Array(1)) 'the length of each element KeySize = 18 'sort FirstName through Cents Offset = 1 'start sorting with LastName ElSize = LEN(Array(1)) 'the length of each element KeySize = ElSize 'sort based on all 4 fields FOR X = 1 TO NumEls% 'build the array from DATA READ Array(X).LastName READ Array(X).FirstName READ Amount$ 'format the amount into money Dot = INSTR(Amount$, ".") IF Dot THEN RSET Array(X).Dollars = LEFT$(Amount$, Dot - 1) Array(X).Cents = LEFT$(MID$(Amount$, Dot + 1) + "00", 2) ELSE RSET Array(X).Dollars = Amount$ Array(X).Cents = "00" END IF NEXT Segment = VARSEG(Array(1)) 'show where the array is Address = VARPTR(Array(1)) ' located in memory CALL TypeSort(Segment, Address, ElSize, Offset, KeySize, NumEls%) CLS 'display the results FOR X = 1 TO NumEls% PRINT Array(X).LastName, Array(X).FirstName, PRINT Array(X).Dollars; "."; Array(X).Cents NEXT DATA Smith, John, 123.45 DATA Cramer, Phil, 11.51 DATA Hogan, Edward, 296.08 DATA Cramer, Phil, 112.01 DATA Malin, Donald, 13.45 DATA Cramer, Phil, 111.3 DATA Smith, Ralph, 123.22 DATA Smith, John, 112.01 DATA Hogan, Edward, 8999.04 DATA Hogan, Edward, 8999.05 DATA Smith, Bob, 123.45 DATA Cramer, Phil, 11.50 DATA Hogan, Edward, 296.88 DATA Malin, Donald, 13.01 DATA Cramer, Phil, 111.1 DATA Smith, Ralph, 123.07 DATA Smith, John, 112.01 DATA Hogan, Edward, 8999.33 DATA Hogan, Edward, 8999.17 DATA Hogan, Edward, 8999.24 DATA Smith, John, 123.05 DATA Cramer, David, 1908.80 DATA Cramer, Phil, 112 END SUB TypeSort (Segment, Address, ElSize, Displace, KeySize, NumEls) STATIC REDIM QStack(NumEls \ 5 + 10) 'create a stack array First = 1 'initialize working variables Last = NumEls Offset = Displace - 1 'decrement once now rather than ' repeatedly later DO DO Temp = (Last + First) \ 2 'seek midpoint I = First J = Last DO WHILE Compare3%(Segment, Address + Offset + (I - 1) * ElSize, Segment, _ Address + Offset + (Temp-1) * ElSize, KeySize) = -1 '< 1 for descending I = I + 1 WEND WHILE Compare3%(Segment, Address + Offset + (J - 1) * ElSize, Segment, _ Address + Offset + (Temp-1) * ElSize, KeySize) = 1 '< -1 for descending J = J - 1 WEND IF I > J THEN EXIT DO IF I < J THEN CALL SwapMem(Segment, Address + (I - 1) * ElSize, Segment, _ Address + (J - 1) * ElSize, ElSize) IF Temp = I THEN Temp = J ELSEIF Temp = J THEN Temp = I END IF END IF I = I + 1 J = J - 1 LOOP WHILE I <= J IF I < Last THEN QStack(StackPtr) = I 'Push I QStack(StackPtr + 1) = Last 'Push Last StackPtr = StackPtr + 2 END IF Last = J LOOP WHILE First < Last IF StackPtr = 0 THEN EXIT DO 'Done StackPtr = StackPtr - 2 First = QStack(StackPtr) 'Pop First Last = QStack(StackPtr + 1) 'Pop Last LOOP ERASE QStack 'delete the stack array END SUB
As you can see, this version of the Quick Sort subprogram is derived from the one shown earlier. The important difference is that all of the incoming information is passed as segments, addresses, and bytes, rather than using an explicit array name. But before describing the inner details of the subprogram itself, I'll address the demonstration portion and show how the routine is set up and called.
As with some of the other procedures on the disk that comes with this book, you will extract the TypeSort subprogram and add it to your own programs by loading it as a module, and then using the Move option of BASIC's View Subs menu. You can quickly access this menu by pressing F2, and then use Alt-M to select Move. Once this is done you will unload TYPESORT.BAS using the Alt-F-U menu selection, and answer *No* when asked if you want to save the modified file. You could also copy the TypeSort subprogram into a separate file, and then load that file as a module in each program that needs it.
Although the example TYPE definition here shows only four components, you may of course use any TYPE structure. TypeSort expects six parameters to tell it where in memory the array is located, how far into each element the comparison routines are to begin, the total length of each element, the length of the key fields, and the number of elements to sort.
After defining MyType, the setup portion of TYPESORT.BAS establishes the offset, element size, and key size parameters. As you can see, four different sample setups are provided, and you should add remarking apostrophes to all but one of them. If the program is left as is, the last setup values will take precedence.
The next section reads sample names, addresses and dollar amounts from DATA statements, and formats the dollar amounts as described earlier. The dollar portion of the amounts are right justified into the Dollars field of each element, and the Cents portion is padded with trailing zeros as necessary to provide a dollars and cents format. This way, the value 12.3 will be assigned as 12.30, and 123 will be formatted to 123.00 which gives the expected appearance.
The final setup step is to determine where the array begins in memory. Since you specify the starting segment and address, it is simple to begin sorting at any array element. For example, to sort elements 100 through 200--even if the array is larger than that--you'd use VARSEG(Array(100)) and VARPTR(Array(100) instead of element 1 as shown in this example.
In addition to the starting segment and address of the array, TypeSort also requires you to tell it how many elements to consider. If you are sorting the entire array and the array starts with element 1, this will simply be the UBOUND of the array. If you are sorting just a portion of the array then you give it only the number of elements to be sorted. So to sort elements 100 through 200, the number of elements will be 101. A general formula you can use for calculating this based on element numbers is NumElements = LastElement - FirstElement + 1.
Now let's consider the TypeSort subprogram itself. Since it is more like the earlier QSort program than different, I will cover only the differences here. In fact, the primary difference is in the way comparisons and exchanges are handled. The Compare3 function introduced earlier is used to compare the array elements with the midpoint. Although QSort made a temporary copy of the midpoint element, that would be difficult to do here. Since the routine is designed to work with any type of data--and the size of each element can vary depending on the TYPE structure--it is impractical to make a copy.
While SPACE$ could be used to claim a block of memory into which the midpoint element is copied, there's a much better way: the Temp variable is used to remember the element number itself. The only complication is that once elements I and J are swapped, Temp must be reassigned if it was equal to either of them. (This happens just below the call to SwapMem.) But the simple integer IF test and assignment required adds far less code and is much faster than making a copy of the element.
TypeSort is designed to sort the array in ascending order, and comments in the code show how to change it to sort descending instead. If you prefer to have one subprogram that can do both, you should add an extra parameter, perhaps called Direction. Near the beginning of the routine before the initial outer DO you would add this:
IF Direction = 0 THEN 'sort ascending ICompare = -1 JCompare = 1 ELSE 'sort descending ICompare = 1 JCompare = -1 END IF
Then, where the results from Compare3 are compared to -1 and 1 replace those comparisons (at the end of each WHILE line) to instead use ICompare and JCompare:
WHILE Compare3%(...) = ICompare I = I + 1 WEND WHILE Compare3%(...) = JCompare J = J - 1 WEND
This way, you are using variables to establish the sorting direction, and those variables can be set either way each time TypeSort is called.
The last major difference is that elements are exchanged using the SwapMem routine rather than BASIC's SWAP statement. While it is possible to call SWAP by aliasing its name as shown in Chapter 5, it was frankly simpler to write a new routine for this purpose. Further, BASIC's SWAP is slower than SwapMem because it must be able to handle variables of different lengths, and also exchange fixed-length and conventional strings. SwapMem is extremely simple, and it works very quickly.
As I stated earlier, the only way to write a truly generic sort routine is by passing segments and addresses and bytes, instead of array names. Although it would be great if BASIC could let you declare a subprogram or function using the AS ANY option to allow any type of data, that simply wouldn't work. As BASIC compiles your program, it needs to know the size and type of each parameter. When you reference TypeVar.LastName, BASIC knows where within TypeVar the LastName portion begins, and uses that in its address calculations. It is not possible to avoid this limitation other than by using addresses as is done here.
Indeed, this is the stuff that C and assembly language programs are made of. In these languages--especially assembly language--integer pointer variables are used extensively to show where data is located and how long it is. However, the formulas used within the Compare3 and SwapMem function calls are not at all difficult to understand.
The formula Address + Offset - (I - 1) * ElSize indicates where the key field of element I begins. Address holds the address of the beginning of the first element, and Offset is added to identify the start of the first sort key. (I - 1) is used instead of I because addresses are always zero- based. That is, the first element in the array from TypeSort's perspective is element 0, even though the calling program considers it to be element 1. Finally, the element number is multiplied times the length of each element, to determine the value that must be added to the starting address and offset to obtain the final address for the data in element I. Please understand that calculations such as these are what the compiler must do each time you access an array element.
Note that if you call TypeSort incorrectly or give it illegal element numbers, you will not receive a "Subscript out of range" error from BASIC. Rather, you will surely crash your PC and have to reboot. This is the danger--and fun--of manipulating pointers directly.
As I stated earlier, the SwapMem routine that does the actual exchanging of elements is very simple, and it merely takes a byte from one element and exchanges it with the corresponding byte in the other. This task is greatly simplified by the use of the XCHG assembly language command, which is similar to BASIC's SWAP statement. Although XCHG cannot swap a word in memory with another word in memory, it can exchange memory with a register. SwapMem is shown in the listing below.
;SWAPMEM.ASM, swaps two sections of memory .Model Medium, Basic .Code SwapMem Proc Uses SI DI DS ES, Var1:DWord, Var2:DWord, NumBytes:Word Lds SI,Var1 ;get the segmented address of the ; first variable Les DI,Var2 ;and for the second variable Mov CX,NumBytes ;get the number of bytes to exchange Jcxz Exit ;we can't swap zero bytes! DoSwap: Mov AL,ES:[DI] ;get a byte from the second variable Xchg AL,[SI] ;swap it with the first variable Stosb ;complete the swap and increment DI Inc SI ;point to the next source byte Loop DoSwap ;continue until done Exit: Ret ;return to BASIC SwapMem Endp End
Earlier I showed how to modify the simple Bubble Sort routine to sort a parallel index array instead of the primary array. One important reason you might want to do that is to allow access to the primary array in both its original and sorted order. Another reason, and one we will get to shortly, is to facilitate sorting disk files. Although a routine to sort the records in a file could swap the actual data, it takes a long time to read and write that much data on disk. Further, each time you wanted to access the data sorted on a different key, the entire file would need to be sorted again.
A much better solution is to create one or more sorted lists of record numbers, and store those on disk each in a separate file. This lets you access the data sorted by name, or by zip code, or by any other field, without ever changing the actual file. The TypeISort subprogram below is adapted from TypeSort, and it sorts an index array that holds the element numbers of a TYPE array.
'TYPISORT.BAS, indexed multi-key sort for TYPE arrays DEFINT A-Z DECLARE FUNCTION Compare3% (BYVAL Seg1, BYVAL Adr1, BYVAL Seg2, _ BYVAL Adr2, NumBytes) DECLARE SUB SwapMem (BYVAL Seg1, BYVAL Adr1, BYVAL Seg2, _ BYVAL Adr2, BYVAL Length) DECLARE SUB TypeISort (Segment, Address, ElSize, Offset, KeySize, _ NumEls, Index()) CONST NumEls% = 23 'this keeps it all on the screen TYPE MyType LastName AS STRING * 10 FirstName AS STRING * 10 Dollars AS STRING * 6 Cents AS STRING * 2 END TYPE REDIM Array(1 TO NumEls%) AS MyType REDIM Index(1 TO NumEls%) 'create the index array Offset = 1 'start sorting with LastName ElSize = LEN(Array(1)) 'the length of each element KeySize = ElSize 'sort based on all 4 fields FOR X = 1 TO NumEls% 'build the array from DATA READ Array(X).LastName READ Array(X).FirstName READ Amount$ ... 'this continues as already ... ' shown in TypeSort NEXT FOR X = 1 TO NumEls% 'initialize the index Index(X) = X - 1 'but starting with 0 NEXT Segment = VARSEG(Array(1)) 'show where the array is Address = VARPTR(Array(1)) ' located in memory CALL TypeISort(Segment, Address, ElSize, Offset, KeySize, NumEls%, Index()) CLS 'display the results FOR X = 1 TO NumEls% '+ 1 adjusts to one-based PRINT Array(Index(X) + 1).LastName, PRINT Array(Index(X) + 1).FirstName, PRINT Array(Index(X) + 1).Dollars; "."; PRINT Array(Index(X) + 1).Cents NEXT DATA Smith, John, 123.45 'this continues as already ... ' shown in TypeSort ... END SUB TypeISort (Segment, Address, ElSize, Displace, KeySize, NumEls, _ Index()) STATIC REDIM QStack(NumEls \ 5 + 10) 'create a stack First = 1 'initialize working variables Last = NumEls Offset = Displace - 1 'make zero-based now for speed later DO DO Temp = (Last + First) \ 2 'seek midpoint I = First J = Last DO 'change -1 to 1 and 1 to -1 to sort descending WHILE Compare3%(Segment, Address + Offset + (Index(I) * ElSize), _ Segment, Address + Offset + (Index(Temp) * ElSize), KeySize) = -1 I = I + 1 WEND WHILE Compare3%(Segment, Address + Offset + (Index(J) * ElSize), _ Segment, Address + Offset + (Index(Temp) * ElSize), KeySize) = 1 J = J - 1 WEND IF I > J THEN EXIT DO IF I < J THEN SWAP Index(I), Index(J) IF Temp = I THEN Temp = J ELSEIF Temp = J THEN Temp = I END IF END IF I = I + 1 J = J - 1 LOOP WHILE I <= J IF I < Last THEN QStack(StackPtr) = I 'Push I QStack(StackPtr + 1) = Last 'Push Last StackPtr = StackPtr + 2 END IF Last = J LOOP WHILE First < Last IF StackPtr = 0 THEN EXIT DO 'Done StackPtr = StackPtr - 2 First = QStack(StackPtr) 'Pop First Last = QStack(StackPtr + 1) 'Pop Last LOOP ERASE QStack 'delete the stack array END SUB
As with TypeSort, TypeISort is entirely pointer based so it can be used with any type of data and it can sort multiple contiguous keys. The only real difference is the addition of the Index() array parameter, and the extra level of indirection needed to access the index array each time a comparison is made. Also, when a swap is required, only the integer index elements are exchanged, which simplifies the code and reduces its size. Like TypeSort, you can change the sort direction by reversing the -1 and 1 values used with Compare3, or add a Direction parameter to the list and modify the code to use that.
As with BubbleISort, the index array is initialized to increasing values by the calling program; however, here the first element is set to hold a value of 0 instead of 1. This reduces the calculations needed within the routine each time an address must be obtained. Therefore, when TypeISort returns, the caller must add 1 to the element number held in each index element. This is shown within the FOR/NEXT loop that displays the sorted results.
With the development of TypeISort complete, we can now use that routine to sort disk files. The sorting strategy will be to determine how many records are in the file, to determine how many separate passes are needed to process the entire file. TypeISort and TypeSort are restricted to working with arrays no larger than 64K (32K in the editing environment), so there is a limit as to how much data may be loaded into memory at one time. These sort routines can accommodate more data when compiled because address calculations that result in values larger than 32767 cause an overflow error in the QB editor. This overflow is in fact harmless, and is ignored in a compiled program unless you use the /d switch.
Although the routines could be modified to perform segment and address arithmetic to accommodate larger arrays, that still wouldn't solve the problem of having more records than can fit in memory at once. Therefore, separate passes must be used to sort the file contents in sections, with each pass writing a temporary index file to disk. A final merge pass then reads each index to determine which pieces fits where, and then writes the final index file. The program FILESORT.BAS below incorporates all of the sorting techniques shown so far, and includes a few custom BASIC routines to improve its performance.
'FILESORT.BAS, indexed multi-key random access file sort DEFINT A-Z DECLARE FUNCTION Compare3% (BYVAL Seg1, BYVAL Adr1, BYVAL Seg2, _ BYVAL Adr2, NumBytes) DECLARE FUNCTION Exist% (FileSpec$) DECLARE SUB DOSInt (Registers AS ANY) DECLARE SUB FileSort (FileName$, NDXName$, RecLength, Offset, KeySize) DECLARE SUB LoadFile (FileNum, Segment, Address, Bytes&) DECLARE SUB SaveFile (FileNum, Segment, Address, Bytes&) DECLARE SUB SwapMem (BYVAL Seg1, BYVAL Adr1, BYVAL Seg2, BYVAL Adr2, _ BYVAL Length) DECLARE SUB TypeISort (Segment, Address, ElSize, Offset, KeySize, _ NumEls, Index()) RANDOMIZE TIMER 'create new data each run DEF FnRand% = INT(RND * 10 + 1) 'returns RND from 1 to 10 TYPE RegType 'used by DOSInt AX AS INTEGER BX AS INTEGER CX AS INTEGER DX AS INTEGER BP AS INTEGER SI AS INTEGER DI AS INTEGER FL AS INTEGER DS AS INTEGER ES AS INTEGER END TYPE DIM SHARED Registers AS RegType 'share among all subs REDIM LastNames$(1 TO 10) 'we'll select names at REDIM FirstNames$(1 TO 10) ' random from these NumRecords = 2988 'how many test records to use FileName$ = "TEST.DAT" 'really original, eh? NDXName$ = "TEST.NDX" 'this is the index file name TYPE RecType LastName AS STRING * 11 FirstName AS STRING * 10 Dollars AS STRING * 6 Cents AS STRING * 2 AnyNumber AS LONG 'this shows that only key OtherNum AS LONG ' information must be ASCII END TYPE FOR X = 1 TO 10 'read the possible last names READ LastNames$(X) NEXT FOR X = 1 TO 10 'and the possible first names READ FirstNames$(X) NEXT DIM RecordVar AS RecType 'to create the sample file RecLength = LEN(RecordVar) 'the length of each record CLS PRINT "Creating a test file..." IF Exist%(FileName$) THEN 'if there's an existing file KILL FileName$ 'kill the old data from prior END IF ' runs to start fresh IF Exist%(NDXName$) THEN 'same for any old index file KILL NDXName$ END IF '---- Create some test data and write it to the file OPEN FileName$ FOR RANDOM AS #1 LEN = RecLength FOR X = 1 TO NumRecords RecordVar.LastName = LastNames$(FnRand%) RecordVar.FirstName = FirstNames$(FnRand%) Amount$ = STR$(RND * 10000) Dot = INSTR(Amount$, ".") IF Dot THEN RSET RecordVar.Dollars = LEFT$(Amount$, Dot - 1) RecordVar.Cents = LEFT$(MID$(Amount$, Dot + 1) + "00", 2) ELSE RSET RecordVar.Dollars = Amount$ RecordVar.Cents = "00" END IF RecordVar.AnyNumber = X PUT #1, , RecordVar NEXT CLOSE '----- Created a sorted index based on the main data file Offset = 1 'start sorting with LastName KeySize = 29 'sort based on first 4 fields PRINT "Sorting..." CALL FileSort(FileName$, NDXName$, RecLength, Offset, KeySize) '----- Display the results CLS VIEW PRINT 1 TO 24 LOCATE 25, 1 COLOR 15 PRINT "Press any key to pause/resume"; COLOR 7 LOCATE 1, 1 OPEN FileName$ FOR RANDOM AS #1 LEN = RecLength OPEN NDXName$ FOR BINARY AS #2 FOR X = 1 TO NumRecords GET #2, , ThisRecord 'get next rec. number GET #1, ThisRecord, RecordVar 'then the actual data PRINT RecordVar.LastName; 'print each field PRINT RecordVar.FirstName; PRINT RecordVar.Dollars; "."; PRINT RecordVar.Cents IF LEN(INKEY$) THEN 'pause on a keypress WHILE LEN(INKEY$) = 0: WEND END IF NEXT CLOSE VIEW PRINT 1 TO 24 'restore the screen END DATA Smith, Cramer, Malin, Munro, Passarelli DATA Bly, Osborn, Pagliaro, Garcia, Winer DATA John, Phil, Paul, Anne, Jacki DATA Patricia, Ethan, Donald, Tami, Elli END FUNCTION Exist% (Spec$) STATIC 'reports if a file exists DIM DTA AS STRING * 44 'the work area for DOS DIM LocalSpec AS STRING * 60 'guarantee the spec is in LocalSpec$ = Spec$ + CHR$(0) ' DGROUP for BASIC PDS Exist% = -1 'assume true for now Registers.AX = &H1A00 'assign DTA service Registers.DX = VARPTR(DTA) 'show DOS where to place it Registers.DS = VARSEG(DTA) CALL DOSInt(Registers) Registers.AX = &H4E00 'find first matching file Registers.CX = 39 'any file attribute okay Registers.DX = VARPTR(LocalSpec) Registers.DS = VARSEG(LocalSpec) CALL DOSInt(Registers) 'see if there's a match IF Registers.FL AND 1 THEN 'if the Carry flag is set Exist% = 0 ' there were no matches END IF END FUNCTION SUB FileSort (FileName$, NDXName$, RecLength, Displace, KeySize) STATIC CONST BufSize% = 32767 'holds the data being sorted Offset = Displace - 1 'make zero-based for speed later '----- Open the main data file FileNum = FREEFILE OPEN FileName$ FOR BINARY AS #FileNum '----- Calculate the important values we'll need NumRecords = LOF(FileNum) \ RecLength RecsPerPass = BufSize% \ RecLength IF RecsPerPass > NumRecords THEN RecsPerPass = NumRecords NumPasses = (NumRecords \ RecsPerPass) - ((NumRecords MOD RecsPerPass) _ <> 0) IF NumPasses = 1 THEN RecsLastPass = RecsPerPass ELSE RecsLastPass = NumRecords MOD RecsPerPass END IF '----- Create the buffer and index sorting arrays REDIM Buffer(1 TO 1) AS STRING * BufSize REDIM Index(1 TO RecsPerPass) IndexAdjust = 1 '----- Process all of the records in manageable groups FOR X = 1 TO NumPasses IF X < NumPasses THEN 'if not the last pass RecsThisPass = RecsPerPass 'do the full complement ELSE 'the last pass may have RecsThisPass = RecsLastPass ' fewer records to do END IF FOR Y = 1 TO RecsThisPass 'initialize the index Index(Y) = Y - 1 'starting with value of 0 NEXT '----- Load a portion of the main data file Segment = VARSEG(Buffer(1)) 'show where the buffer is CALL LoadFile(FileNum, Segment, Zero, RecsThisPass * CLNG(RecLength)) CALL TypeISort(Segment, Zero, RecLength, Displace, KeySize, _ RecsThisPass, Index()) '----- Adjust the zero-based index to record numbers FOR Y = 1 TO RecsThisPass Index(Y) = Index(Y) + IndexAdjust NEXT '----- Save the index file for this pass TempNum = FREEFILE OPEN "$$PASS." + LTRIM$(STR$(X)) FOR OUTPUT AS #TempNum CALL SaveFile(TempNum, VARSEG(Index(1)), Zero, RecsThisPass * 2&) CLOSE #TempNum '----- The next group of record numbers start this much higher IndexAdjust = IndexAdjust + RecsThisPass NEXT ERASE Buffer, Index 'free up the memory '----- Do a final merge pass if necessary IF NumPasses > 1 THEN NDXNumber = FREEFILE OPEN NDXName$ FOR BINARY AS #NDXNumber REDIM FileNums(NumPasses) 'this holds the file numbers REDIM RecordNums(NumPasses) 'this holds record numbers REDIM MainRec$(1 TO NumPasses) 'holds main record data REDIM Remaining(1 TO NumPasses) 'tracks index files '----- Open the files and seed the first round of data FOR X = 1 TO NumPasses FileNums(X) = FREEFILE OPEN "$$PASS." + LTRIM$(STR$(X)) FOR BINARY AS #FileNums(X) Remaining(X) = LOF(FileNums(X)) 'this is what remains MainRec$(X) = SPACE$(RecLength) 'holds main data file GET #FileNums(X), , RecordNums(X) 'get the next record number RecOffset& = (RecordNums(X) - 1) * CLNG(RecLength) + 1 GET #FileNum, RecOffset&, MainRec$(X) 'then get the data NEXT FOR X = 1 TO NumRecords Lowest = 1 'assume this is the lowest data in the group WHILE Remaining(Lowest) = 0 'Lowest can't refer to a dead index Lowest = Lowest + 1 'so seek to the next higher active index WEND FOR Y = 2 TO NumPasses 'now seek out the truly lowest element IF Remaining(Y) THEN 'consider only active indexes IF Compare3%(SSEG(MainRec$(Y)), _ '<-- use VARSEG with QB SADD(MainRec$(Y)) + Offset, _ SSEG(MainRec$(Lowest)), _ '<-- use VARSEG with QB SADD(MainRec$(Lowest)) + Offset, KeySize) = -1 THEN Lowest = Y END IF END IF NEXT PUT #NDXNumber, , RecordNums(Lowest) 'write the main index Remaining(Lowest) = Remaining(Lowest) - 2 IF Remaining(Lowest) THEN 'if the index is still active GET #FileNums(Lowest), , RecordNums(Lowest) RecOffset& = (RecordNums(Lowest) - 1) * CLNG(RecLength) + 1 GET #FileNum, RecOffset&, MainRec$(Lowest) END IF NEXT ELSE '----- Only one pass was needed so simply rename the index file NAME "$$PASS.1" AS NDXName$ END IF CLOSE 'close all open files IF Exist%("$$PASS.*") THEN 'ensure there's a file to kill KILL "$$PASS.*" 'kill the work files END IF ERASE FileNums, RecordNums 'erase the work arrays ERASE MainRec$, Remaining END SUB SUB LoadFile (FileNum, Segment, Address, Bytes&) STATIC IF Bytes& > 32767 THEN Bytes& = Bytes& - 65536 Registers.AX = &H3F00 'read from file service Registers.BX = FILEATTR(FileNum, 2) 'get the DOS handle Registers.CX = Bytes& 'how many bytes to load Registers.DX = Address 'and at what address Registers.DS = Segment 'and at what segment CALL DOSInt(Registers) END SUB SUB SaveFile (FileNum, Segment, Address, Bytes&) STATIC IF Bytes& > 32767 THEN Bytes& = Bytes& - 65536 Registers.AX = &H4000 'write to file service Registers.BX = FILEATTR(FileNum, 2) 'get the DOS handle Registers.CX = Bytes& 'how many bytes to load Registers.DX = Address 'and at what address Registers.DS = Segment 'and at what segment CALL DOSInt(Registers) END SUB SUB TypeISort (....) STATIC 'as shown in TYPISORT.BAS END SUB
FILESORT.BAS begins by defining a function that returns a random number between 1 and 10. Although the earlier sort demonstrations simply read the test data from DATA statements, that is impractical when creating thousands of records. Instead, two arrays are filled--one with ten last names and another with ten first names--and these names are drawn from at random. The Registers TYPE variable that is defined is used by three of the supporting routines in this program. RegType is normally associated with CALL Interrupt and InterruptX, but I have written a small-code replacement to mimic InterruptX that works with DOS Interrupt &H21 only. DOSInt accepts just a single Registers argument, instead of the three parameters that BASIC's Interrupt and InterruptX require. Besides adding less code each time it is used, the routine itself is smaller and simpler than InterruptX.
The remainder of the demonstration program should be easy to follow, so I won't belabor its operation; the real action is in the FileSort subprogram.
Like TypeSort and TypeISort, FileSort is entirely pointer based, to accommodate TYPE elements of any size and structure. You provide the name of the main data file to be sorted, the name of an index file to create, and the length and offset of the keys within the disk records. The Displace parameter tells how far into the TYPE structure the key information is located. When calling TypeISort this value is should be one-based, but in the final merge pass where Compare3 is used, a zero-based number is required. Therefore, a copy is made (Offset = Displace - 1) near the beginning of the routine. This way, both are available quickly without having to calculate - 1 repeatedly slowing its operation.
The initial steps FileSort performs are to determine how many records are in the data file, and from that how many records can fit into memory at one time. Once these are known, the number of passes necessary can be easily calculated. An extra step is needed to ensure that RecsPerPass is not greater than the number of records in the file. Just because 200 records can fit into memory at once doesn't mean there are that many records. In most cases where multiple passes are needed the last pass will process fewer records than the others. If there are, say, 700 records and each pass can sort 300, the last pass will sort only 100 records.
Once the pass information is determined, a block of memory is created to hold each portion of the file for sorting. This is the purpose of the Buffer array. REDIM is used to create a 32K chunk of memory that doesn't impinge on available string space.
For each pass that is needed, the number of records in the current pass is determined and the index array is initialized to increasing values. Then, a portion of the main data file is read using the LoadFile subprogram. BASIC does not allow you to read records from a random access file directly into a buffer specified by its address. And even if it did, we can load data much faster than pure BASIC by reading a number of records all at once.
Once the current block of records has been loaded, TypeISort is called to sort the index array. The index array is also saved very quickly using SaveFile, which is the compliment to LoadFile. A unique name is given to each temporary index file such that the first one is named $$PASS.1, the second $$PASS.2, and so forth. By using dollar signs in the name it is unlikely that the routine will overwrite an existing file from another application. Of course, you may change the names to anything else if you prefer.
Notice the extra step that manipulates the IndexAdjust variable. This adjustment is needed because each sort pass returns the index array holding record numbers starting at 0. The first time through, 1 must be added to each element to reflect BASIC's use of record numbers that start at 1. If the first pass sorts, say, 250 records, then the index values 1 through 250 are saved to disk. But the second pass is processing records 251 through 500, so an adjustment value of 251 must be added to each element prior to writing it to disk.
If the data file is small and only one pass was needed, the $$PASS.1 file is simply renamed to whatever the caller specified. Otherwise, a merge pass is needed to determine which record number is the next in sequence based on the results of each pass. Believe it or not, this is the trickiest portion of the entire program. For the sake of discussion, we'll assume that four passes were required to sort the file.
Each of the four index files contains a sequence of record numbers, and all of the records within that sequence are in sorted order. However, there is no relationship between the data records identified in one index file and those in another. Thus, each index file and corresponding data record must be read in turn. A FOR/NEXT loop then compares each of the four records, to see which is truly next in the final sequence. The complication arises as the merge nears completion, because some of the indexes will have become exhausted. This possibility is handled by the Remaining array.
Elements in the Remaining array are initialized to the length of each index file as the indexes are opened. Then, as each index entry is read from disk, the corresponding element is decremented by two to show that another record number was read. Therefore, the current Remaining element must be checked to see if that index has been exhausted. Otherwise, data that was already processed might be considered in the merge comparisons.
The final steps are to close all the open files, delete the temporary index files, and erase the work arrays to free the memory they occupied.
One important point to observe is the use of SSEG to show Compare3 where the MainRec$ elements are located. SSEG is for BASIC 7 only; if you are using QuickBASIC you must change SSEG to VARSEG. SSEG can be used with either near or far strings in BASIC 7, but VARSEG works with near strings only. SSEG is used as the default, so an error will be reported if you are using QuickBASIC. The cursor will then be placed near the comment in the program that shows the appropriate correction.
As with sorting, searching data effectively also requires that you select an appropriate algorithm. There are many ways to search data, and we will look at several methods here. The easiest to understand is a linear search, which simply examines each item in sequence until a match is found:
FoundAt = 0 'assume no match FOR X = 1 TO NumElements 'search all elements IF Array$(X) = Sought$ THEN FoundAt = X 'remember where it is EXIT FOR 'no need to continue END IF NEXT IF FoundAt THEN 'if it was found PRINT "Found at element"; FoundAt ELSE PRINT "Not found" 'otherwise END IF
For small arrays a linear search is effective and usually fast enough. Also, integer and long integer arrays can be searched reasonably quickly even if there are many elements. But with string data, as the number of elements that must be searched increases, the search time can quickly become unacceptable. This is particularly true when additional features are required such as searching without regard to capitalization or comparing only a portion of each element using MID$. Indeed, many of the same techniques that enhance a sort routine can also be employed when searching.
To search ignoring capitalization you would first capitalize Sought$ outside of the loop, and then use UCASE$ with each element in the comparisons. Using UCASE$(Sought$) repeatedly within the loop is both wasteful and unnecessary:
Sought$ = UCASE$(Sought$) . . IF UCASE$(Array$(X)) = Sought$ THEN
Likewise, comparing only a portion of each string will require MID$ with each comparison, after using MID$ initially to extract what is needed from Sought$:
Sought$ = MID$(Sought$, 12, 6) . . IF MID$(Array$(X), 12, 6) = Sought$ THEN
And again, as with sorting, these changes may be combined in a variety of ways. You could even use INSTR to see if the string being searched for is within the array, when an exact match is not needed:
IF INSTR(UCASE$(Array$(X)), Sought$) THEN
However, each additional BASIC function you use will make the searching slower and slower. Although BASIC's INSTR is very fast, adding UCASE$ to each comparison as shown above slows the overall process considerably.
There are three primary ways that searching can be speeded up. One is to apply simple improvements based on understanding how BASIC works, and knowing which commands are fastest. The other is to select a better algorithm. The third is to translate selected portions of the search routine into assembly language. I will use all three of these techniques here, starting with enhancements to the linear search, and culminating with a very fast binary search for use with sorted data.
One of the slowest operations that BASIC performs is comparing strings. For each string, its descriptor address must be loaded and passed to the comparison routine. That routine must then obtain the actual data address, and examine each byte in both strings until one of the characters is different, or it determines that both strings are the same. As I mentioned earlier, if one or both of the strings are fixed-length, then copies also must be made before the comparison can be performed.
There is another service that the string comparison routine must perform, which is probably not obvious to most programmers and which also impacts its speed. BASIC frequently creates and then deletes temporary strings without your knowing it. One example is the copy it makes of fixed-length strings before comparing them. But there are other, more subtle situations in which this can happen.
For example, when you use IF X$ + Y$ > Z$ BASIC must create a temporary string comprised of X$ + Y$, and then pass that to the comparison routine. Therefore, that routine is also responsible for determining if the incoming string is a temporary copy, and deleting it if so. In fact, all of BASIC's internal routines that accept string arguments are required to do this.
Therefore, one good way to speed searching of conventional (not fixed- length) string arrays is to first compare the lengths. Since strings whose lengths are different can't possibly be the same, this will quickly weed those out. BASIC's LEN function is much faster than its string compare routine, and it offers a simple but effective opportunity to speed things up. LEN is made even faster because it requires only a single argument, as opposed to the two required for the comparison routine.
SLen = LEN(Sought$) 'do this once outside the loop FOR X = 1 TO NumElements IF LEN(Array$(X)) = SLen THEN 'maybe... IF Array$(X) = Sought$ THEN 'found it! FoundAt = X EXIT FOR END IF END IF NEXT
Similarly, if the first characters are not the same then the strings can't match either. Like LEN, BASIC's ASC is much faster than the full string comparison routine, and it too can improve search time by eliminating elements that can't possibly match. Depending on the type and distribution of the data in the array, using both LEN and ASCII can result in a very fast linear search:
SLen = LEN(Sought$) SAsc = ASC(Sought$) FOR X = 1 TO NumElements IF LEN(Array$(X)) = SLen THEN IF ASC(Array$(X)) = SAsc THEN IF Array$(X) = Sought$ THEN ... END IF END IF END IF NEXT
Notice that the LEN test must always be before the ASC test, to avoid an "Illegal function call" error if the array element is a null string. If all or most of the strings are the same length, then LEN will not be helpful, and ASC should be used alone.
As I mentioned before, when comparing fixed-length string arrays BASIC makes a copy of each element into a conventional string, prior to calling its comparison routine. This copying is also performed when using ASC is used, but not LEN. After all, the length of a fixed-length never changes, and BASIC is smart enough to know the length directly. But then, comparing the lengths of these string is pointless anyway.
Because of the added overhead to make these copies, the performance of a conventional linear search for fixed-length data is generally quite poor. This is a shame, because fixed-length strings are often the only choice when as much data as possible must be kept in memory at once. And fixed- length strings lend themselves perfectly to names and addresses. It should be apparent by now that the best solution for quickly comparing fixed- length string arrays--and the string portion of TYPE arrays too--is with the various Compare functions already shown.
If you are searching for an exact match, then either Compare or Compare2 will be ideal, depending on whether you want to ignore capitalization. If you have only a single string element in each array, you should define a dummy TYPE. This avoids the overhead of having to use both VARSEG and VARPTR as separate arguments. The short example program and SearchType functions that follow search a fixed-length string array for a match.
DEFINT A-Z DECLARE FUNCTION Compare% (SEG Type1 AS ANY, SEG Type2 AS ANY, NumBytes) DECLARE FUNCTION Compare2% (SEG Type1 AS ANY, SEG Type2 AS ANY, NumBytes) DECLARE FUNCTION SearchType% (Array() AS ANY, Sought AS ANY) DECLARE FUNCTION SearchType2% (Array() AS ANY, Sought AS ANY) DECLARE FUNCTION SearchType3% (Array() AS ANY, Searched AS ANY) CLS TYPE FLen 'this lets us use SEG LastName AS STRING * 15 END TYPE REDIM Array(1 TO 4000) AS FLen '4000 is a lot of names DIM Search AS FLen 'best comparing like data FOR X = 1 TO 4000 STEP 2 'impart some realism Array(X).LastName = "Henderson" NEXT Array(4000).LastName = "Henson" 'almost at the end Search.LastName = "Henson" 'find the same name '----- first time how long it takes using Compare Start! = TIMER 'start timing FOR X = 1 TO 5 'search five times FoundAt = SearchType%(Array(), Search) NEXT IF FoundAt >= 0 THEN PRINT "Found at element"; FoundAt ELSE PRINT "Not found" END IF Done! = TIMER PRINT USING "##.## seconds with Compare"; Done! - Start! PRINT '----- then time how long it takes using Compare2 Start! = TIMER 'start timing FOR X = 1 TO 5 'as above FoundAt = SearchType2%(Array(), Search) NEXT IF FoundAt >= 0 THEN PRINT "Found at element"; FoundAt ELSE PRINT "Not found" END IF Done! = TIMER PRINT USING "##.## seconds with Compare2"; Done! - Start! PRINT '---- finally, time how long it takes using pure BASIC Start! = TIMER FOR X = 1 TO 5 FoundAt = SearchType3%(Array(), Search) NEXT IF FoundAt >= 0 THEN PRINT "Found at element"; FoundAt ELSE PRINT "Not found" END IF Done! = TIMER PRINT USING "##.## seconds using BASIC"; Done! - Start! END FUNCTION SearchType% (Array() AS FLen, Sought AS FLen) STATIC SearchType% = -1 'assume not found FOR X = LBOUND(Array) TO UBOUND(Array) IF Compare%(Array(X), Sought, LEN(Sought)) THEN SearchType% = X 'save where it was found EXIT FOR 'and skip what remains END IF NEXT END FUNCTION FUNCTION SearchType2% (Array() AS FLen, Sought AS FLen) STATIC SearchType2% = -1 'assume not found FOR X = LBOUND(Array) TO UBOUND(Array) IF Compare2%(Array(X), Sought, LEN(Sought)) THEN SearchType2% = X 'save where it was found EXIT FOR 'and skip what remains END IF NEXT END FUNCTION FUNCTION SearchType3% (Array() AS FLen, Searched AS FLen) STATIC SearchType3% = -1 'assume not found FOR X = LBOUND(Array) TO UBOUND(Array) IF Array(X).LastName = Searched.LastName THEN SearchType3% = X 'save where it was found EXIT FOR 'and skip what remains END IF NEXT END FUNCTION
When you run this program it will be apparent that the SearchType function is the fastest, because it uses Compare which doesn't perform any case conversions. SearchType2 is only slightly slower with that added overhead, and the purely BASIC function, SearchType3, lags far behind at half the speed. Note that the array is searched five times in succession, to minimize the slight errors TIMER imposes. Longer timings are generally more accurate than short ones, because of the 1/18th second resolution of the PC's system timer.
This is about as far as we can go using linear searching, and to achieve higher performance requires a better algorithm. The Binary Search is one of the fastest available; however, it requires the data to already be in sorted order. A Binary Search can also be used with a sorted index, and both methods will be described.
Binary searches are very fast, and also very simple to understand. Unlike the Quick Sort algorithm which achieves great efficiency at the expense of being complicated, a Binary Search can be written using only a few lines of code. The strategy is to start the search at the middle of the array. If the value of that element value is less than that of the data being sought, a new halfway point is checked and the process repeated. This way, the routine can quickly zero in on the value being searched for. Figure 8-3 below shows how this works.
13: Zambia 12: Sweden 11: Peru 10: Mexico <-- step 2 9: Holland 8: Germany 7: Finland <-- step 1 6: England 5: Denmark 4: China 3: Canada 2: Austria 1: Australia
If you are searching for Mexico, the first element examined is number 7, which is halfway through the array. Comparing Mexico to Finland shows that Mexico is greater, so the distance is again cut in half. In this case, a match was found after only two tries--remarkably faster than a linear search that would have required ten comparisons. Even when huge arrays must be searched, data can often be found in a dozen or so tries. One interesting property of a binary search is that it takes no longer to find the last element in the array than the first one.
The program below shows one way to implement a Binary Search.
DEFINT A-Z DECLARE FUNCTION BinarySearch% (Array$(), Find$) CLS PRINT "Creating test data..." REDIM Array$(1 TO 1000) 'create a "sorted" array FOR X = 1 TO 1000 Array$(X) = "String " + RIGHT$("000" + LTRIM$(STR$(X)), 4) NEXT PRINT "Searching array..." FoundAt = BinarySearch%(Array$(), "String 0987") IF FoundAt >= 0 THEN PRINT "Found at element"; FoundAt ELSE PRINT "Not found" END IF END FUNCTION BinarySearch% (Array$(), Find$) STATIC BinarySearch% = -1 'no matching element yet Min = LBOUND(Array$) 'start at first element Max = UBOUND(Array$) 'consider through last DO Try = (Max + Min) \ 2 'start testing in middle IF Array$(Try) = Find$ THEN 'found it! BinarySearch% = Try 'return matching element EXIT DO 'all done END IF IF Array$(Try) > Find$ THEN 'too high, cut in half Max = Try - 1 ELSE Min = Try + 1 'too low, cut other way END IF LOOP WHILE Max >= Min END FUNCTION
The BinarySearch function returns either the element number where a match was found, or -1 if the search string was not found. Not using a value of zero to indicate failure lets you use arrays that start with element number 0. As you can see, the simplicity of this algorithm belies its incredible efficiency. The only real problem is that the data must already be in sorted order. Also notice that two string comparisons must be made--one to see if the strings are equal, and another to see if the current element is too high. Although you could use Compare3 which examines the strings once and tells if the data is the same or which is greater, a Binary Search is so fast that this probably isn't worth the added trouble. As you will see when you run the test program, it takes far longer to create the data than to search it!
Besides the usual enhancements that can be applied to the comparisons using UCASE$ or MID$, this function could also be structured to use a parallel index array. Assuming the data is not sorted but the index array is, the modified Binary Search would look like this:
FUNCTION BinaryISearch% (Array$(), Index(), Find$) STATIC BinaryISearch% = -1 'assume not found Min = LBOUND(Array$) 'start at first element Max = UBOUND(Array$) 'consider through last DO Try = (Max + Min) \ 2 'start testing in middle IF Array$(Index(Try)) = Find$ THEN 'found it! BinaryISearch% = Try 'return matching element EXIT DO 'all done END IF IF Array$(Index(Try)) > Find$ THEN 'too high, cut Max = Try - 1 ELSE Min = Try + 1 'too low, cut other way END IF LOOP WHILE Max >= Min END FUNCTION
All of the searching techniques considered so far have addressed string data. In most cases, string array searches are the ones that will benefit the most from improved techniques. As you have already seen, BASIC makes copies of fixed-length strings before comparing them, which slows down searching. And the very nature of strings implies that many bytes may have to be compared before determining if they are equal or which string is greater. In most cases, searching a numeric array is fast enough without requiring any added effort, especially when the data is integer or long integer.
However, a few aspects of numeric searching are worth mentioning here. One is avoiding the inevitable rounding errors that are sure to creep into the numbers you are examining. Another is that in many cases, you may not be looking for an exact match. For example, you may need to find the first element that is higher than a given value, or perhaps determine the smallest value in an array.
Unlike strings that are either the same or they aren't, the binary representation of numeric values is not always so precise. Consider the following test which *should* result in a match, but doesn't.
Value! = 1! Result! = 2! CLS FOR X = 1 TO 1000 Value! = Value! + .001 NEXT IF Value! = Result! THEN PRINT "They are equal" ELSE PRINT "Value! ="; Value! PRINT "Result! ="; Result! END IF
After adding .001 to Value! 1000 times Value! should be equal to 2, but instead it is slightly higher. This is because the binary storage method used by computers simply cannot represent every possible value with absolute accuracy. Even changing all of the single precision exclamation points (!) to double precision pound signs (#) will not solve the problem. Therefore, to find a given value in a numeric array can require some extra trickery.
What is really needed is to determine if the numbers are *very close* to each other, as opposed to exactly the same. One way to accomplish this is to subtract the two, and see if the result is very close to zero. This is shown below.
Value! = 1! Result! = 2! CLS FOR X = 1 TO 1000 Value! = Value! + .001 NEXT IF ABS(Value! - Result!) < .0001 THEN PRINT "They are equal" ELSE PRINT "Value! ="; Value! PRINT "Result! ="; Result! END IF
Here, the absolute value of the difference between the numbers is examined, and if that difference is very small the numbers are assumed to be the same. Unfortunately, the added overhead of subtracting before comparing slows the comparison even further. There is no simple cure for this, and an array search must apply this subtraction to each element that is examined.
Another common use for numeric array searches is when determining the largest or smallest value. Many programmers make the common mistake shown below when trying to find the largest value in an array.
MaxValue# = 0 FOR X = 1 TO NumElements IF Array#(X) > MaxValue# THEN MaxValue# = Array#(X) Element = X END IF NEXT PRINT "The largest value found is"; MaxValue# PRINT "And it was found at element"; Element
The problem with this routine is that it doesn't account for arrays where all of the elements are negative numbers! In that case no element will be greater than the initial MaxValue#, and the routine will incorrectly report zero as the result. The correct method is to obtain the lowest element value, and use that as a starting point:
MaxValue# = Array#(1) FOR X = 2 TO NumElements IF Array#(X) > MaxValue# THEN MaxValue# = Array#(X) END IF NEXT PRINT "The largest value found is"; MaxValue#
Determining the highest value in an array would be handled similarly, except the greater-than symbol (>) would be replaced with a less-than operator (<).
The final searching technique I will show is Soundex. It is often useful to search for data based on its sound, for example when you do not know how to spell a person's name. Soundex was invented in the 1920's and has been used since then by, among others, the U.S. Census Bureau. A Soundex code is an alpha-numeric representation of the sound of a word, and it is surprisingly accurate despite its simplicity. The classic implementation of Soundex returns a four-character result code. The first character is the same as the first letter of the word, and the other three are numeric digits coded as shown in Figure 8-4.
1 B, F, P, V 2 C, G, J, K, Q, S, X 3 D, T 4 L 5 M, N 6 R
Letters not shown are simply skipped as being statistically insignificant to the sound of the word. In particular, speaking accents often minimize the importance of vowels, and blur their distinction. If the string is short and there are fewer than four digits, the result is simply padded with trailing zeros. One additional rule is that a code digit is never repeated, unless there is an uncoded letter in between. In the listing that follows, two different implementations of Soundex are shown.
'SOUNDEX.BAS, Soundex routines and example DEFINT A-Z DECLARE FUNCTION ASoundex$ (Word$) DECLARE FUNCTION ISoundex% (Word$) CLS DO PRINT "press Enter alone to exit" INPUT "What is the first word"; FWord$ IF LEN(FWord$) = 0 THEN EXIT DO INPUT "What is the second word"; SWord$ PRINT 'Test by alpha-numeric soundex PRINT "Alpha-Numeric Soundex: "; FWord$; " and "; PRINT SWord$; " do "; IF ASoundex$(FWord$) <> ASoundex$(SWord$) THEN PRINT "NOT "; END IF PRINT "sound the same." PRINT 'Test by numeric soundex PRINT " Numeric Soundex: "; FWord$; " and "; PRINT SWord$; " do "; IF ISoundex%(FWord$) <> ISoundex%(SWord$) THEN PRINT "NOT "; END IF PRINT "sound the same." PRINT LOOP END FUNCTION ASoundex$ (InWord$) STATIC Word$ = UCASE$(InWord$) Work$ = LEFT$(Word$, 1) + "000" WkPos = 2 PrevCode = 0 FOR L = 2 TO LEN(Word$) Temp = INSTR("BFPVCGJKQSXZDTLMNR", MID$(Word$, L, 1)) IF Temp THEN Temp = ASC(MID$("111122222222334556", Temp, 1)) IF Temp <> PrevCode THEN MID$(Work$, WkPos) = CHR$(Temp) PrevCode = Temp WkPos = WkPos + 1 IF WkPos > 4 THEN EXIT FOR END IF ELSE PrevCode = 0 END IF NEXT ASoundex$ = Work$ END FUNCTION FUNCTION ISoundex% (InWord$) STATIC Word$ = UCASE$(InWord$) Work$ = "0000" WkPos = 1 PrevCode = 0 FOR L = 1 TO LEN(Word$) Temp = INSTR("BFPVCGJKQSXZDTLMNR", MID$(Word$, L, 1)) IF Temp THEN Temp = ASC(MID$("111122222222334556", Temp, 1)) IF Temp <> PrevCode THEN MID$(Work$, WkPos) = CHR$(Temp) PrevCode = Temp WkPos = WkPos + 1 IF WkPos > 4 THEN EXIT FOR END IF ELSE PrevCode = 0 END IF NEXT ISoundex% = VAL(Work$) END FUNCTION
The first function, ASoundex, follows the standard Soundex definition and returns the result as a string. The ISoundex version cheats slightly by coding the first letter as a number, but it returns an integer value instead of a string. Because integer searches are many times faster than string searches, this version will be better when thousands--or even hundreds of thousands--of names must be examined.
An additional benefit of the integer-only method is that it allows for variations on the first letter. For example, if you enter Cane and Kane in response to the prompts from SOUNDEX.BAS ASoundex will not recognize the names as sounding alike where ISoundex will.
No discussion of searching and sorting would be complete without a mention of linked lists and other data links. Unlike arrays where all of the elements lie in adjacent memory locations, linked data is useful when data locations may be disjointed. One example is the linked list used by the DOS File Allocation Table (FAT) on every disk. As I described in Chapter 6, the data in each file may be scattered throughout the disk, and only through a linked list can DOS follow the thread from one sector in a file to another.
Another example where linked data is useful--and the one we will focus on here--is to keep track of memo fields in a database. A memo field is a field that can store freeform text such as notes about a sales contact or a patient's medical history. Since these fields typically require varying lengths, it is inefficient to reserve space for the longest one possible in the main database file. Therefore, most programs store memo fields in a separate disk file, and use a *pointer field* in the main data file to show where the corresponding memo starts in the dedicated memo file. Similarly, a back pointer adjacent to each memo identifies the record that points to it. This is shown in Figure 8-5 below.
Here, the pointer in the main data file record is a long integer that holds the byte offset into the memo file where the corresponding memo text begins. And just before the memo text is an integer record number that shows which record this memo belongs to. (If you anticipate more than 65,535 records a long integer must be used instead.) Thus, these pointers provide links between the two files, and relate the information they contain.
When a new record is added to the main file, the memo that goes with it is appended to the end of the memo file. BASIC's LOF function can be used to determine the current end of the memo file, which is then used as the beginning offset for the new memo text. And as the new memo is appended to MEMO.DAT, the first data actually written is the number of the new record in the main data file.
The record number back pointer in the memo file is needed to allow memo data to be edited. Since there's no reasonable way to extend memo text when other memo data follows it, most programs simply abandon the old text, and allocate new space at the end of the file. The abandoned text is then marked as such, perhaps by storing a negative value as the record number. Storing a negative version of the abandoned data's length is ideal, because that both identifies the data as obsolete, and also tells how much farther into the file the next memo is located.
The idea here is that you would periodically run a memo file maintenance program that compacts the file, thus eliminating the wasted space the abandoned memos occupy. This is similar to the DBPACK.BAS utility shown in Chapter 7, and also similar to the way that BASIC compacts string memory when it becomes full. But when an existing memo is relocated in the memo file, the field in the main data file that points to the memo must also be updated. And that's why the record number back pointer is needed: so the compaction program can know which record in the main file must be updated.
The "L" identifier in the memo file in Figure 8-5, shown between the record number and memo text, is a length byte or word that tells how long the text is. If you plan to limit the memo field lengths to 255 or fewer characters, then a single byte is sufficient. Otherwise an integer must be used. An example of code that reads a data record and then its associated memo text is shown below.
GET #MainFile, RecNumber, TypeVar MemoOffset& = TypeVar.MemoOff GET #MemoFile, MemoOffset& + 2, MemoLength% Memo$ = SPACE$(MemoLength%) GET #MemoFile, , Memo$
The first step reads a record from the main data file into a TYPE variable, and the second determines where in the memo file the memo text begins. Two is added to that offset in the second GET statement, to skip over the record number back pointer which isn't needed here. Once the length of the memo text is known, a string is assigned to that length, and the actual text is read into it.
If you are using long integer record numbers you would of course use MemoOffset& + 4 in the second GET. And if you're using a single byte to hold the memo length you would define a fixed-length string to receive that byte:
DIM Temp AS STRING *1 GET #MemoFile, MemoOffset& + 2, Temp MemoLength = ASC(Temp)
Since BASIC doesn't offer a byte-sized integer data type, ASC and STR$ can be used to convert between numeric and string formats.
The last issue related to array and memory manipulation I want to cover is inserting and deleting elements. If you intend to maintain file indexes or other information in memory and in sorted order, you will need some way to insert a new entry. By the same token, deleting an entry in a database requires that the parallel index entry also be deleted.
The most obvious way to insert or delete elements in an array is with a FOR/NEXT loop. The first example below inserts an element, and the second deletes one.
'----- Insert an element: Element = 200 InsertValue = 999 FOR X = UBOUND(Array) TO Element + 1 STEP -1 Array(X) = Array(X - 1) NEXT Array(Element) = InsertValue '----- Delete an element: Element = 200 FOR X = Element TO UBOUND(Array) - 1 Array(X) = Array(X + 1) NEXT Array(UBOUND(Array)) = 0 'optionally clear last element
For integer, long integer, and fixed-length arrays this is about as efficient as you can get, short of rewriting the code in assembly language. However, with floating point and string arrays the performance is less than ideal. Unless a numeric coprocessor is installed, floating point values are assigned using interrupts and support code in the emulator library. This adds an unnecessary level of complication that also impacts the speed. When strings are assigned the situation is even worse, because of the memory allocation overhead associated with dynamic string management.
A better solution for floating point and string arrays is a series of SWAP statements. The short program below benchmarks the speed difference of the two methods, as it inserts an element into a single precision array.
REDIM Array(1 TO 500) CLS Element% = 200 InsertValue = 999 Start = TIMER FOR A% = 1 TO 500 FOR X% = UBOUND(Array) TO Element% + 1 STEP -1 Array(X%) = Array(X% - 1) NEXT Array(Element%) = InsertValue NEXT Done = TIMER PRINT USING "##.## seconds when assigning"; Done - Start Start = TIMER FOR A% = 1 TO 500 FOR X% = UBOUND(Array) TO Element% + 1 STEP -1 SWAP Array(X%), Array(X% - 1) NEXT Array(Element%) = InsertValue NEXT Done = TIMER PRINT USING "##.## seconds when swapping"; Done - Start
If you run this program in the BASIC environment, the differences may not appear that significant. But when the program is compiled to an executable file, the swapping method is more than four times faster. In fact, you should never compare programming methods using the BASIC editor for exactly this reason. In many cases, the slowness of the interpreting process overshadows significant differences between one approach and another.
String arrays also benefit greatly from using SWAP instead of assignments, though the amount of benefit varies depending on the length of the strings. If you modify the previous program to use a string array, also add this loop to initialize the elements:
FOR X% = 1 TO 500 Array$(X%) = "String number" + STR$(X) NEXT
With BASIC PDS far strings the difference is only slightly less at about three to one, due to the added complexity of far data. Also, SWAP will always be worse than assignments when inserting or deleting elements in a fixed-length string or TYPE array. An assignment merely copies the data from one location to another. SWAP, however, must copy the data in both directions.
Understand that when using SWAP with conventional string arrays, the data itself is not exchanged. Rather, the four-byte string descriptors are copied. But because BASIC program modules store string data in different segments, extra work is necessary to determine which descriptor goes with which segment. When near strings are being used, only six bytes are exchanged, regardless of the length of the strings. Four bytes hold the descriptors, and two more store the back pointers.
This chapter explained many of the finer points of sorting and searching all types of data in BASIC. It began with sorting concepts using the simple Bubble Sort as a model, and then went on to explain indexed and multi-key sorts. One way to implement a multi-key sort is by aligning the key fields into adjacent TYPE components. While there are some restrictions to this method, it is fairly simple to implement and also very fast.
The Quick Sort algorithm was shown, and the SEEQSORT.BAS program on the accompanying disk helps you to understand this complex routine by displaying graphically the progress of the comparisons and exchanges as they are performed. Along the way you saw how a few simple modifications to any string sort routine can be used to sort regardless of capitalization, or based on only a portion of a string element.
You also learned that writing a truly general sort routine that can handle any type of data requires dealing exclusively with segment and address pointers. Here, assembly language routines are invaluable for assisting you when performing the necessary comparisons and data exchanges. Although the actual operation of the assembly language routines will be deferred until Chapter 12, such routines may easily be added to a BASIC program using .LIB and .QLB libraries.
I mentioned briefly the usefulness of packing and aligning data when possible, as an aid to fast sorting. In particular, dates can be packed to only three bytes in Year/Month/Day order, and other data such as zip codes can be stored in long integers. Because numbers can be compared much faster than strings, this helps the sorting routines operate more quickly.
Array searching was also discussed in depth, and both linear and binary search algorithms were shown. As with the sorting routines, searching can also employ UCASE$ and MID$ to search regardless of capitalization, or on only a portion of each array element. Two versions of the Soundex algorithm were given, to let you easily locate names and other data based on how they sound.
Besides showing the more traditional searching methods, I presented routines to determine the minimum and maximum values in a numeric array. I also discussed some of the ramifications involved when searching floating point data, to avoid the inevitable rounding errors that might cause a legitimate match to be ignored.
Finally, some simple ways to insert and delete elements in both string and numeric arrays were shown. Although making direct assignments in a loop is the most obvious way to do this, BASIC's often-overlooked SWAP command can provide a significant improvement in speed.
The next chapter will conclude this section about hands-on programming by showing a variety of program optimization techniques.
By Sane <sane@telia.com>
In this article we (or actually I) will talk about texture mapping.
I know I said I was gonna write about optimizing the poly routines, but that's really up to you, and texture mapping is a lot more interesting :)
But first, the gouraud shading poly routine had a bug in it which I said I'd give you a fix for if I figured out what was wrong, remember?
The bug is easily solved by changing
FOR x=x1 to x2
in the gLINE routine, to
FOR x=INT(x1) to x2
That way we make sure no numbers will be rounded so that a pixel is skipped or such.
For all the lazy people out there, there's a file called gpolyfix.bas with the downloadable version of this issue. In it, I also changed the PPS (polys per second) counter so that it's more accurate.
The texture mapping I'm writing about in this article doesn't give the best visual results, since it doesn't correct for perspective and such, but it's one of the fastest ones. When I decided to write an article about texture mapping instead of optimizing, I started to look around for information about it, since I didn't know how it was done at the time :) I didn't understand much, partially cause I wanted to learn it quickly, and didn't take the time to understand it, but also cause a lot of the tutorials made it seem a lot harder than it is. Today I tried an idea I came up with though, and it worked, so I finally started writing the article :)
Texture mapping is really a very simple thing to do, pretty similar to gouraud shading. You do exactly the same thing as you did when interpolating color values in gouraud shading, only that you do everything twice, since we're using two texture coordinates (U and V) instead of one color value (C). When changing the gPoly function into tPoly, I didn't do much but to replace c1,c2 and so on with u1,u2 and so on, and adding v1,v2 and so on (quite a lot of so on, I know, but I don't know how to write it in a better way :) Same thing could be done with gLINE, although I wrote tLINE from scratch, since that took less time :)
Here's the code from tPoly, also available from tpoly.bas in the downloadable version of QBCM:
'Made by Sane at the 1st of August 2001, for QBCM SUB tPoly (xx1, yy1, xx2, yy2, xx3, yy3, uu1, vv1, uu2, vv2, uu3, vv3) 'Declare an array for storing slopes DIM poly(199, 1) 'Declare arrays for texture coordinates DIM upos(199, 1) DIM vpos(199, 1) 'Point and texture coordinate sorting IF yy1 < yy2 AND yy1 < yy3 THEN x1 = xx1: y1 = yy1: u1 = uu1: v1 = vv1 IF yy2 < yy1 AND yy2 < yy3 THEN x1 = xx2: y1 = yy2: u1 = uu2: v1 = vv2 IF yy3 < yy1 AND yy3 < yy2 THEN x1 = xx3: y1 = yy3: u1 = uu3: v1 = vv3 IF yy1 > yy2 AND yy1 > yy3 THEN x3 = xx1: y3 = yy1: u2 = uu1: v2 = vv1 IF yy2 > yy1 AND yy2 > yy3 THEN x3 = xx2: y3 = yy2: u2 = uu2: v2 = vv2 IF yy3 > yy1 AND yy3 > yy2 THEN x3 = xx3: y3 = yy3: u2 = uu3: v2 = vv3 IF yy1 <> y1 AND yy1 <> y3 THEN x2 = xx1: y2 = yy1: u3 = uu1: v3 = vv1 IF yy2 <> y1 AND yy2 <> y3 THEN x2 = xx2: y2 = yy2: u3 = uu2: v3 = vv2 IF yy3 <> y1 AND yy3 <> y3 THEN x2 = xx3: y2 = yy3: u3 = uu3: v3 = vv3 'Calculating of the slope and texture coordinates from point 1 to point 2 x = 0 xm = 0 u = 0 um = 0 v = 0 vm = 0 IF x1 + x2 <> 0 AND y1 + y2 <> 0 THEN xm = (x1 - x2) / (y1 - y2) IF u1 + u2 <> 0 AND y1 + y2 <> 0 THEN um = (u1 - u2) / (y1 - y2) IF v1 + v2 <> 0 AND y1 + y2 <> 0 THEN vm = (v1 - v2) / (y1 - y2) FOR y = y1 TO y2 poly(y, 0) = x + x1 upos(y, 0) = u + u1 vpos(y, 0) = v + v1 x = x + xm u = u + um v = v + vm NEXT y 'Calculating of the slope and texture coordinates from point 2 to point 3 x = 0 xm = 0 u = 0 um = 0 v = 0 vm = 0 IF x2 + x3 <> 0 AND y2 + y3 <> 0 THEN xm = (x2 - x3) / (y2 - y3) IF u2 + u3 <> 0 AND y2 + y3 <> 0 THEN um = (u2 - u3) / (y2 - y3) IF v2 + v3 <> 0 AND y2 + y3 <> 0 THEN vm = (v2 - v3) / (y2 - y3) FOR y = y2 TO y3 poly(y, 0) = x + x2 upos(y, 0) = u + u2 vpos(y, 0) = v + v2 x = x + xm u = u + um v = v + vm NEXT y 'Calculating of the slope and texture coordinates from point 1 to point 3 m = 0 x = 0 u = 0 um = 0 v = 0 vm = 0 IF x1 + x3 <> 0 AND y1 + y3 <> 0 THEN m = (x1 - x3) / (y1 - y3) IF u1 + u3 <> 0 AND y1 + y3 <> 0 THEN um = (u1 - u3) / (y1 - y3) IF v1 + v3 <> 0 AND y1 + y3 <> 0 THEN vm = (v1 - v3) / (y1 - y3) FOR y = y1 TO y3 poly(y, 1) = x + x1 upos(y, 1) = u + u1 vpos(y, 1) = v + v1 x = x + m u = u + um v = v + vm NEXT y 'The easiest part, drawing FOR y = y1 TO y3 tLINE poly(y, 0), poly(y, 1), y, upos(y, 0), vpos(y, 0), upos(y, 1), vpos(y, 1) NEXT y END SUB
Code for tLINE:
'Made by Sane at the 1st of August 2001, for QBCM SUB tLINE (x1, x2, y, u1, v1, u2, v2) u = u1 um = 0 v = v1 vm = 0 IF u1 - u2 <> 0 AND x1 - x2 <> 0 THEN um = (u1 - u2) / (x1 - x2) IF v1 - v2 <> 0 AND x1 - x2 <> 0 THEN vm = (v1 - v2) / (x1 - x2) FOR x = INT(x1) TO x2 PSET (x, y), texture(u, v) u = u + um v = v + vm NEXT x END SUB
And the main testing code:
DECLARE SUB tPoly (xx1!, yy1!, xx2!, yy2!, xx3!, yy3!, uu1!, vv1!, uu2!, vv2!, uu3!, vv3!) DECLARE SUB tLINE (x1!, x2!, y!, u1!, v1!, u2!, v2!) 'Made by Sane at the 1st of August 2001, for QBCM SCREEN 13 'Declare a variable for holding the texture DIM SHARED texture(31, 31) AS INTEGER 'Fill the texture with random stuff FOR y = 0 TO 31 FOR x = 0 TO 31 texture(x, y) = INT(RND * 255) NEXT x NEXT y 'Setting color for PPS (Poly Per Second) rate text COLOR 15 oldtimer! = TIMER DO UNTIL INKEY$ = CHR$(27) x1 = INT(RND * 320) x2 = INT(RND * 320) x3 = INT(RND * 320) y1 = INT(RND * 200) y2 = INT(RND * 200) y3 = INT(RND * 200) tPoly x1, y1, x2, y2, x3, y3, INT(RND * 31), INT(RND * 31), INT(RND * 31), INT(RND * 31), INT(RND * 31), INT(RND * 31) polynum = polynum + 1 IF TIMER > oldtimer! + 1 THEN s = s + 1: LOCATE 1, 1: PRINT polynum / s: oldtimer! = TIMER LOOP
That's all for this time. As always, mail any comments/suggestions/questions to sane@telia.com. I still don't get more than 2-3 comments on my articles, which doesn't feel very nice, since I don't know if anyone reads them...
Next article will be an introduction into programming a 3D engine, that we'll do a few articles from now on.
See ya in next issue,
-Sane
By Matt2Jones <matt2jones@yahoo.com>
Hello fellow coders of QB,
I am Matt2jones, a Qb programmer, probably the only one left, who is still
OBSESSED with RPGs. I will hopefully be posting a series of articles in QBCM
about RPG developement, and how to finish one. This first article will be more
of an introduction than a tutorial, but the rest WILL be better.
First up, "What the fuck is an RPG?" Well, an RPG is a Role Playing Game (Believe it or not, it took me FOUR MONTHS to make that association!), this means the player TAKES ON THE ROLE of a character and lives in their universe (Am I sounding familiar? If I am, stop reading because you've heard this stuff allready.). USUALLY an event of great importants is taking place while the player controlls the character, but I stress the word "usually". Most RPGs tend to focus around the Plot and character developement more than fast action, and that suits most people fine (No "tricky" INKEY$ functions nessery, just reliable old INPUT A$!;).
An RPG also tends to have whats called a "Turn Based Battle system". Ie, you attack, the enemy attacks, you attack, the enemy attacks (you take turns). While some of the more... Profesional RPGs are slowly loosing this technique (FFIX for example), it is still my favorite method of combat.
Most Qb RPGs tend to be made with "Scrolling Tile Engines", a method I'll discuss in the next issue along with many others, and nice, colourful tiles. That is NOT nessesery. Infact, the first thing about making an RPG should be the Story, then the Graphics, then the Story, then the Engine, then the Story. Well, perhaps I exagerate a little;-), presentation and ease of use are also important factors, but to this RPGer, Stories come FIRST!
Now I've rambled on long enough (And I don't care if you still don't know what an RPG is, you can find definitions almost everywhere!), its time to get to the main part of this article...
How am I supposed to code this new-fangled RPGamabob? Well, this was probably the step in the ladder that stumped me the most. I was all set to have Waters of Europa the greatest thing since FFVII, but I didn't know where to start. Well, I'll tell you, because no-one told me: Story and Graphics.
Write the story anywhere, and draw the graphcis anywhere (preferably on the computer though, and no smartass, not ON the screen:P But make sure you have the basics done, that way once you get the Engine up and running, you won't have to wait weeks to get graphics to test it.
Well after you finish that stuff it's time to move onto the Engine . To help you I have, in the course of my long Qblife (2 years), devised 3 main ways:
Script Orientated BASIC Orientated (aka CheckStuff) CHAIN Orientated L And a subset of CHAIN Orientated... SUB Orientated!Script Orientated is the most professional and is used for Epic, 1 Meg+ games. BASIC Orientated is the next step down, usually for smaller games (80k). CHAIN Orientated is the most unstructured peice of shit you are ever going to come across, and are probably the most fun to make. I've seen Epic CHAIN games (Elysian Field... NO, SCREW YOU, THAT GAME ROCKED!), and smaller ones which are too insignificant to mention (Okay, Hellpit 2&3). And SUB Orientated is a subset of CHAIN because it is mostly used in that type of game, but SUB games can also be RPGs in their own right (Hellpit/Demon Hunter/Fantassy Power Zone).
Now that was a lot of unhelpful blabber, so I'm going to explane how to make each one now.
In the main module you should have something like this:
DEFINT A-Z DIM SHARED HP, MP, MHP, MMP, ST, DF, GOLD, X, Y, LOCATION, STORY DIM MAP(63, 63, 3) SETUP TITLE A=OPTIONS IF A = 0 THEN INITNEWVARS ELSE LOADOLDVARS A END IF GAME SYSTEM
As you can see, this is pretty much a long list of Sub calls, and some dimensioning Variables.
HP and MP are obvious enough (Health Points and Magic Points), mHp just means Max Health Points, and mMP the same with Magic Points.
St is your strenght (How much damage you can inflict), Df your defense (How much of your enemies damage you ignore).
Gold is how much money you have (Duh).
X and Y are your coordinents on the map.
Location is what map you are on. 0 would be the world map, 1 the town and 2 the Cave or something.
Story is like what chapter of the game you are on. Everytime you have a meaningful cut-scene or something, increase Story by 1.
Map(63, 63, 3) means that there will be 4 maps of 64x64 tiles (for scrolling Tile engines).
Setup does things like set up the screen mode, load fonts and initiate mouse.
Title is just a LoadPalette call, bload a picture to the screen and wait for a keypress.
Options is a function which determins if you want to start a new game or load an old one. If you want to start a new game it returns 0, otherwise it returns the save number.
InitNewVars sets all the variables to their starting values.
LoadOldVars load a save.
Game is the actual engine.
I trust you could write those subs yourself (bar Game)?
Okay, now lets pretend you were writting a SCRIPT RPG. Your game sub would look like this:
DEFINT A-Z SUB Game() DrawScreen 'Take a wild guess what this does DO: A$ = RIGHT$(INKEY$, 1) 'Loads the key that is pressed from the keyboard into A$ IF A$ = "H" THEN MOVEUP 'If the Uparrow is pressed, Call sub MoveUp IF A$ = "P" THEN MOVEDOWN 'The same but down arrow and MoveDown IF A$ = "K" THEN MOVELEFT 'Left arrow, Move Left IF A$ = "M" THEN MOVERIGHT 'I wonder... IF A$ = " " THEN STATS 'Display players Health and stuff. IF A$ = CHR$(27) THEN QUIT 'End the game IF A$ <> "" THEN INTERPRET: DRAWSCREEN 'Interpret the script and draw the screen. LOOP END SUB
Okay...
A summary of what the called subs do as follows:
DrawScreen | - | Draws the screen |
MoveUp | - | Checks if the square above is empty, and if it is moves you to it. |
MoveDown | - | Checks if the square below is empty, and if it is moves you to it. |
MoveLeft | - | Checks if the square to the left is empty, and if it is moves you to it. |
MoveRight | - | Checks if the square to the right is empty, and if it is moves you to it. |
STATS | - | Clear the screen and show something like this: |
QUIT | - | Asks they player if they want to quit and if they say yes, ends the program. |
INTERPRET | - | The script part of the game, and guess what? I'm not going to explain scripts to you! Ha ha! You're going to have to download QBTM and read the tutorials there! Ha Ha!! You loose!! Ha Ha ha ha...ha....HA HA! |
Now, if you were writting a BASIC orientated engine your Main Module would look exactly the same as above, and the subs, except for game, would do exactly the same things.
Game would look like this:
'------------------- DEFINT A-Z SUB Game() DrawScreen 'Take a wild guess what this does DO: A$ = RIGHT$(INKEY$, 1) 'Loads the key that is pressed from the keyboard into A$ IF A$ = "H" THEN MOVEUP 'If the Uparrow is pressed, Call sub MoveUp IF A$ = "P" THEN MOVEDOWN 'The same but down arrow and MoveDown IF A$ = "K" THEN MOVELEFT 'Left arrow, Move Left IF A$ = "M" THEN MOVERIGHT 'I wonder... IF A$ = " " THEN STATS 'Display players Health and stuff. IF A$ = CHR$(27) THEN QUIT 'End the game IF A$ <> "" THEN CheckStuff: DRAWSCREEN 'Interpret the script and draw the screen. LOOP END SUB '-------------------
All the subs listed in that block do the same as they did in the one above, except for "CheckStuff". The CheckStuff sub should look like this:
'------------------- DEFINT A-Z SUB CheckStuff IF STORY = 0 AND X = 12 AND Y = 2 THEN CUT1: Story = 1 IF STORY = 0 AND X = 13 AND Y = 6 THEN CUT2: Story = 1 IF STORY = 1 AND X = 44 AND Y = 2 THEN CUT3: Story = 2 IF STORY = 2 AND X = 3 AND Y = 16 THEN CUT4: Story = 3 IF STORY = 3 AND X = 12 AND Y = 7 THEN WIN: System END SUB '-------------------
Cut1 to 4 are just anamations and talking between characters, while WIN is the complete game screen. You can write a Cut sub however you want, you could even use GOTO in it (Fawning, hot, sexy redhaired french-girl: OOOoooo-He's so brave:-)!
Now CHAIN Orientated games get their name because every part of the map (like shops, towns and dungeons) are ALL seporate files full of QB code, that are linked together by CHAIN commands. I said these are fun to make because you can have a group of 12 or so people and have each person make one part of the game, stick all the files in one folder and run it. It's soooo funny seeing the different graphics styles in one game. And the different ways the character looks. Try it.
Now the way these games are structured is quite different:
You have a folder containing files like these:
INIT.BAS
TITLE.BAS
LOADSAVE.BAS
NEWGAME.BAS
OPTIONS.BAS
VILL1.BAS
VILL2.BAS
WORLD.BAS
CASTLE.BAS
SHOP.BAS
FIGHT.BAS
INN.BAS
WIN.BAS
DIE.BAS
SAVE.BAS
Each of these files contains different aspect of the game.
INIT setup the screen mode, and commons all the variables, then CHAINs to TITLE.BAS.
TITLE bloads the title screen and wait for a key to be pressed, then CHAINs to OPTIONS.BAS.
OPTIONS gives the player a choice then either CHAINs to NEWGAME.BAS or LOADSAVE.BAS.
NEWGAME sets all variables to their default and CHAINs to VILL1.BAS
LOADSAVE load the variables from a file and CHAINs to the last file the player was in.
VILL1 has got the drawscreen, movement and cutscenes for the first village, when you leave the village it CHAINs to WORLD.BAS, when you enter a shop it writes "Vill1.bas" to the end of a datafile, and CHAINs to Shop.
When you leave the shop the "Vill1.bas" is read from the end of the Datafile, and Vill1.bas is chained to.
Do you get the picture of how these work? Do you see why my insane anarchistic mind finds them so funny? Do you? DO YOU?! Well I don't give a shit, because now I'm talking about SUB ORIENTATED GAMES!
These are for VERY simple, Town & Dungeon RPGs. The Main looks somewhat like this:
'------------------- DEFINT A-Z DIM SHARED HP, MP, MHP, MMP, ST, DF, GOLD, X, Y, LOCATION, STORY DIM MAP(63, 63, 3) SETUP TITLE A=OPTIONS IF A = 0 THEN INITNEWVARS ELSE LOADOLDVARS A END IF VILLAGE SYSTEM '-------------------
Look familiar? It should. As you can see, most RPGs have the same BASIC setup, it's whats inside the subs that make the difference. We could learn alot from the RPG, for example, many people look the same. Some are black, some are white, but everyone is different inside, and it's whats inside that makes the difference.
Basicly, the village sub holds movement code, has a drawscreen subroutine and calls the shops when they are needed. If you walk on a certain spot, it calls the Dungeon Sub. This sub is practicly the same as the Village sub, except it has random fights. When you walk on the exit from the Dungeon, EXIT SUB is called. Simple.
Well thats all for now. Next month (or two if QBCM goes Bimonthly) I will talk about the different engines for RPGs out there.
Be Excellent to each other...
This is the last surviving member of the <insert name of alien 1 ship here>, matt2jones, signing off...
email me at Matt2jones@yahoo.com.
Goto my webpage at http://members.nbci.com/matt2jones/
RPG tip of the Month: When writting an RPG, don't use any methods you have never tried before because they will slow procuction as you figure them out, and might even cause you to drop your project if you can't get it to work. |
By Golrien <q@golrien.cjb.net>
Okay, most people already know about Windows BMPs. But I'm lazy, and they're easy to write about. And as a bonus, I'll explain RLE compression, which most BMP specifications leave out.
The BMP (short for Bitmap) image format is one of the simplest image formats in existance. It's also one of the least flexible, was devised by Microsoft, has several quirks and only the most basic of compression.
There are Windows BMPs and there are OS/2 BMPs. This stems from the time when M$ and IBM were working together on OS/2, but then M$ ran off with the source and made Windoze on their own, leaving IBM with the bills to pay, in true Bill Gates style. The OS/2 ones are possibly better (yay IBM) and I might cover them some other time. This tut only covers Windows-format BMPs.
Bitmap image files have a header. Most files have headers, and the best way to store the header info is in a TYPE definition. This is the BMP header, as a TYPE:
TYPE BMPheaderType ID AS STRING * 2 'Should be 'BM' for a windows BMP. FileSize AS LONG 'Size of the whole file. Reserved AS STRING * 4 ImageOffset AS LONG 'Offset of image data in file. InfoHeaderLength AS LONG 'The BitmapInfoHeader starts directly after ' this header. It could be: ' 12 bytes - OS/2 1.x format, or ' 40 bytes - Windows 3.x format, or ' 64 bytes - OS/2 2.x format. ImageWidth AS LONG 'Width and height of the image, in pixels. ImageHeight AS LONG ' - NumPlanes AS INTEGER 'Number of planes. BPP AS INTEGER 'Bits per pixel, the colour depth. Could be: ' 4 bit - 16 colours. ' 8 bit - 256 colours. ' 24 bit - A lot more colours. CompressionType AS LONG 'Type of compression. ' 0 - uncompressed, ' 1 - RLE 8-bit/pixel ' 2 - RLE 4-bit/pixel ImageSize AS LONG 'Size of image data in bytes. xRes AS LONG 'Horizontal and vertical resolution of the yRes AS LONG ' image. NumColsUsed AS LONG 'Number of used colours and number of NumColsImportant AS LONG ' important colours. END TYPE
Now all you have to do to get all the information is:
DIM SHARED BMPheader AS BMPheaderType OPEN BMPfile$ FOR BINARY AS #1 GET #1, , BMPheader
In a 24bpp BMP, the last two entries will be zero (they are in mine, anyway). In a 4- or 8-bit BMP, a palette follows the header. Each colour (there are 16 for a 4-bit and 256 for an 8-bit) has its own entry, with a byte for the red, green and blue.
Unfortunately, it's not that simple.
To start with, the colours are actually stored BGR, not RGB. Secondly, there's a byte of filler after every colour. This wastes 256 bytes, which is a quarter of a kilobyte. More attempts from M$ to waste disk space. Oh, and you'll have to divide the attributes by four to get them to how the VGA likes them. This, however, will load the palette.
'Reset the VGA palette ports. ' OUT &H3C8, 0 'Template strings, so QB knows how many bytes to get out of the file 'at a time. ' Red$ = " ": Green$ = " ": Blue$ = " " Byte$ = " " '2 ^ BMPheader.BPP is the number of colours used in the file. ' FOR i = 1 TO 2 ^ BMPheader.BPP 'Because the BMP stores the palette BGR, and the VGA takes it in 'as RGB, we need to get all the palette values for the colour and 'give them to the VGA one at a time, in reverse order (not 'forgetting the byte of filler): ' GET #1, , Blue$ GET #1, , Green$ GET #1, , Red$ GET #1, , Byte$ 'All the colour attributes must be divided by 4, as they go 0-255 'whereas the VGA card prefers 0-63. ' OUT &H3C9, ASC(Red$) \ 4 OUT &H3C9, ASC(Green$) \ 4 OUT &H3C9, ASC(Blue$) \ 4 NEXT i
That will set the palette up for viewing the BMP. Of course, if you're using SVGAQB or DirectQB or Future.Lib (?) or any other library that likes palettes in strings, you won't just be able to grab the whole string at once. Oh, no, thanks to Bill Gates' backwards nature, you have to do a whole load of screwing around first. This code will load the BMP palette into a string * 768 of RGB byte values, for use with DQBsetPal or whatever (only for 8-bit BMPs).
DIM SHARED Pal AS STRING * 768 'Reset the palette string. ' Pal = "" Red$ = " ": Green$ = " ": Blue$ = " ": Byte$ = " " FOR i = 1 TO 2 ^ BMPheader.BPP GET #1, , Blue$ GET #1, , Green$ GET #1, , Red$ GET #1, , Byte$ Pal = Pal + CHR$(ASC(Red$) \ 4) + CHR$(ASC(Green$) \ 4) + CHR$(ASC(Blue$) \ 4) NEXT i
However you've loaded the palette, the only thing remaining is to grab the image data. However, Micro$oft being Micro$oft, our troubles are not yet over. Whilst the conventional monitor prefers its scanlines to run from top to bottom, the BMP image is stored UPSIDE DOWN! The scanlines still run left to right, but the bottom line is first in the file and the top last. Fortunately, it is easier to get over than the palette difficulties, because it is possible to draw the scanlines in any order. So, for an 8-bit BMP:
Byte$ = " " FOR y = BMPheader.ImageHeight - 1 TO 0 STEP -1 FOR x = 0 TO BMPheader.ImageWidth - 1 GET #1, , Byte$ PSET (x, y), ASC(Byte$) NEXT NEXT
Four-bit BMPs are more difficult, as they have two pixels per byte...
Byte$ = " " FOR y = BMPheader.ImageHeight - 1 TO 0 STEP -1 FOR x = 0 TO BMPheader.ImageWidth - 1 STEP 2 GET #1, , Byte$ LowNibble = ASC(Byte$) \ 16: HighNibble = ASC(Byte$) AND 15 PSET (x, y), LowNibble: PSET (x + 1, y), HighNibble NEXT NEXT
However, the hardest to draw are 24-bit BMPs. Of course, in a hi-col mode it would be simple (simpler than an 8-bit BMP), but QB can't do that, Future.lib has its own BMP functions and this is a BMP tut, not a VESA tut. There are, however, a few ways to load 24-bit BMPs into SCREEN 13. For those that don't know, hi-col screenmodes have no palette, each pixel stores its own colour attributes, so to PSET you have to use an RGB value. 24-bit bmps are stored like that, with three bytes per pixel, one per colour.
The first trick we can play is to greyscale the image. This is easy, just set up a palette of 256 grey shades and add together the RGB values and average them to get a grey value. Then we put that as the colour. That was probably a crap explanation, so here's the code to do it.
'Grey out the palette. ' OUT &H3C8, 0 FOR i = 0 TO 255 OUT &H3C9, i \ 4 OUT &H3C9, i \ 4 OUT &H3C9, i \ 4 NEXT i Red$ = " ": Green$ = " ": Blue$ = " " FOR y = BMPheader.ImageHeight - 1 TO 0 STEP -1 FOR x = 0 TO BMPheader.ImageWidth - 1 GET #1, , Blue$ GET #1, , Green$ GET #1, , Red$ PSET (x, y), (ASC(Red$) + ASC(Green$) + ASC(Blue$)) \ 3 NEXT NEXT
That actually worked the first time I coded it and I never even misspelt a command which made me slightly happier than I usually am. However, I'm still not happy enough to teach you about reducing the colour depth from 24-bit to 8-bit.
It is possible to compress BMPs. Only some of them, the 24-bit ones cannot be compressed and the algorithm for 4- and 8-bit compression sucks pretty much. But it is useful to know how.
The BMP can be compressed in two modes, absolute mode and RLE mode. Both modes can occur anywhere in a single bitmap.
The RLE mode is a very simple, the first byte contains the count and the second the pixel to be replicated (if this makes no sense, don't worry, just cut and paste the code and pretend you wrote it yourself (no, don't do that)). If the count byte is zero the second byte is a special byte.
In absolute mode, the second byte contains the number of bytes to be copied exactly. Each absolute run is word-aligned, which means it may be padded with an extra byte to make the numbers round. After an absolute run, RLE compression continues.
The second bytes after a zero count can be: 0 - end of line. 1 - end of bitmap. 2 - delta - move to a new X and Y position. 3+ - switch to absolute mode.
'RLE-8 compression. Yay. RLE images also have the bottom line 'first, just to make things *really* wierd. ' xPos = 0 yPos = BMPheader.ImageHeight - 1 DO GET #1, , Byte$: ByteCount = ASC(Byte$) IF ByteCount = 0 THEN 'Special code. ' GET #1, , Byte$: Code = ASC(Byte$) IF Code = 0 THEN 'End of line. ' xPos = 0: yPos = yPos - 1 ELSEIF Code = 1 THEN 'End of image. ' EXIT DO ELSEIF Code = 2 THEN 'Delta. ' GET #1, , Byte$ xPos = xPos + ASC(Byte$) yPos = yPos - ASC(Byte$) ELSE 'Absolute mode. ' FOR i = 1 TO Code GET #1, , Byte$ PSET (xPos, yPos), ASC(Byte$) xPos = xPos + 1 NEXT i 'Remember that the bytes must be word-aligned. ' IF Code MOD 2 <> 0 THEN GET #1, , Byte$ END IF ELSE 'Just plain vanilla RLE encoding here. ' GET #1, , Byte$: PixelColour = ASC(Byte$) FOR i = 1 TO ByteCount PSET (xPos, yPos), PixelColour xPos = xPos + 1 NEXT i END IF LOOP
I don't promise *anything* about the delta code, seeing as I have no images that have one and Paint Shop Pro 7 seems to be allergic to them. However, this is the code used in the Allegro library, so I bet it's right.
Four-bit RLE is pretty much the same, except each byte contains two pixels:
'RLE-4 compression. Yay. ' xPos = 0 yPos = BMPheader.ImageHeight - 1 Byte$ = " " DO GET #1, , Byte$: ByteCount = ASC(Byte$) IF ByteCount = 0 THEN GET #1, , Byte$: Code = ASC(Byte$) IF Code = 0 THEN xPos = 0: yPos = yPos - 1 ELSEIF Code = 1 THEN EXIT DO ELSEIF Code = 2 THEN GET #1, , Byte$: xPos = xPos + ASC(Byte$) GET #1, , Byte$: yPos = yPos - ASC(Byte$) ELSE FOR i = 1 TO Code IF i MOD 2 <> 0 THEN GET #1, , Byte$ LowNibble = ASC(Byte$) AND 15 HighNibble = (ASC(Byte$) \ 16) AND 15 PSET (xPos, yPos), HighNibble ELSE PSET (xPos, yPos), LowNibble END IF xPos = xPos + 1 NEXT i IF Code MOD 4 <> 0 THEN GET #1, , Byte$ END IF ELSE GET #1, , Byte$ LowNibble = ASC(Byte$) AND 15 HighNibble = (ASC(Byte$) \ 16) AND 15 FOR i = 1 TO ByteCount IF i MOD 2 <> 0 THEN PSET (xPos, yPos), HighNibble ELSE PSET (xPos, yPos), LowNibble END IF xPos = xPos + 1 NEXT i END IF LOOP
This code took about four days before I got it working, and then I managed to fix it, at 12:20pm one night. That was about ten minutes after I'd finished condensing a 50-line fire effect into a 15-line fire effect for a competition (look out for Minifire, kids =).
Well, that's all there is to it. Hopefully you can do something with this stuff, and I bet this is the only *complete* tutorial with QB code samples. If you didn't understand any of it, have a look at the example program, BMPTUT.BAS, which should be included somewhere, along with some BMPs. Try adding some watches and stepping through the code or something, it might be vaguely interesting.
I might do another of these tutorials. If I do, it'll probably be on RIFF wave files (WAVs), how to play them and stuff.
By Sane <sane@telia.com>
Matt2Jones suggested having a developers diary thing in QBCM, so I decided to make one for my current project.
All times are in international style, or at least that's what I think it's called :)
All dates are in yyyy-mm-dd format
19:52 2001-06-01
Project was restarted, so that I can get the whole project covered by this log/diary.
Today I've done some planning on graphics, and made a few small routines
20:54 2001-06-01
Made a basic temporary tileset.
19:06 2001-06-05
Haven't done much for a few days now, thought I'd better do something about it... :)
19:31 2001-06-05
I did the basic design for the coding of the game, in pretty much the same way as I always do. Most of the sub-routines and functions are empty as of now, but I'll code them as I go.
I also made a basic double-buffered tile engine, without support for clipping, which makes the 'game' look a bit strange :) I'll call it 'game' from now on, until I think it resembles a game :)
Since last time I did something for this game (four days ago), I've felt pretty tired of it... (I think I've broken my record for time needed to get tired of a project :), but when I started coding on it again, I thought it was fun again :) Strange, isn't it?
It's also a good thing that I'm writing this log for QBCM, which kinda gives me a 'big brother' feeling or something, so that I can't just drop the project :)
19:41 2001-06-05
I think it's time to mention what kinda game it'll be now :)
The game will be named "Apple Eater 2", and no, it isn't a nibbles clone...
It's the followup to a game called "Apple Eater" that might have been released when you read this...check http://vgamesoft.hybd.net/ if you're interested in it. It is finished, it just isn't released, cause some stuff needs to be done with the levels. Ok, it isn't finished, but very close :) My part of it is all done, at least :)
"Apple Eater" is a puzzle/platform game where you're a boy who's gonna eat lotsa apples (with a stunning story behind, that gives an explanation of why he needs to eat all those apples), and "Apple Eater 2" will be a platform game in the style of Superfrog, for those of you who have played it. For you who haven't, I can mention it's a game similar to Sonic, but a lot better (IMHO) The version I've played is for Amiga, but there is a version for PC, which you should be able to find somewhere on the internet (the demo, cause piracy is bad for you :).
I wanna write more, but you probably don't wanna read more, so I'll stop...for now
22:37 2001-06-14
Finally, after 9 days, I got back to working on AE2 :)
Today I've changed the keyhandler from INKEY$ to the Z Keyboard handler, by gza_wu and maz, and made some small stuff, like loading levels when levels are completed (although you can't play any levels yet), and stuff.
I became quite surprised when I noticed how much of the code from AE I can reuse in AE2, when thinking about how different it is (will be)... I could even reuse most(!) of the code if I'd make an RPG, shoot 'em up or whatever, which I think is pretty cool... That's one of the good things about coding sophisticated :)
By looking at the AE code, I also remember a lot of things I need to do, like storing which direction the player is facing (wouldn't be nice if he'd always turn right after walking in any direction, would it? :)
I think AE2 can get finished relatively soon, sooner than I originally thought, even though I'm coding it pretty lazily, with long breaks inbetween :)
14:46 2001-07-16
More than one month has passed, but I finally got back to programming on AE2.
Today I haven't done much important stuff...yet :) Just a few small routines.
22:14 2001-07-28
The last few days I've been trying to convert my tile routines to assembly, and today I finally managed to get the one without clipping working, thanks to Michael Chabot (Frag)
Also made an assembly version of the bounding box collision detection routine.
Frag has also made a new keyhandler for the game, and he'll make a few other routines in asm, like pixel perfect collision detection (will be combined with bounding box to gain speed)
21:50 2001-07-29
Noticed that the bounding box collision detection didn't work as supposed, fixed that :)
Also bugfixed a buffer copying routine I made yesterday to replace PUT (0,0),buffer,PSET
-Sane
Haven't got any this month...
Aethersoft, creators of Zeta and DS4QB2, with a really nice site design, are worthy winners of our Site of the Month award.
As last month, we have no award image for Site of the Month. But in the next issue we'll hopefully have one both for this and next issue. - Ed.
This month it should actually have been "demos of the month", since we're having Qasirs qb demostuffs #1 as demo of the month. When you see all the nice stuff in it, it's quite obvious why it is the demo of the month, but unfortunately it doesn't come with any source code.
Qasirs qb demostuffs #1 is included with the downloadable version as Qsrdem1.zip (you can also download it by clicking the link)
Due to certain reasons, we didn't have a new cultpoll this month.