QB64.org Forum

Active Forums => QB64 Discussion => Topic started by: Dimster on September 17, 2021, 02:01:56 pm

Title: A Question on Sorting
Post by: Dimster on September 17, 2021, 02:01:56 pm
Basically the question is ... is there any real difference if a sort is low to high or high to low?

In this instance I have a sequential file of data. I do not want to use a Quick Sort or Bubble Sort.Nor do I want to create another Array. Neither do I want to use SWAP. All I want is to do is have 5 Variables - High1, High2, High3, High4, High5. Where High1 will hold the highest of all values. Then input one data value at a time, compare it to the values in the various High variables and slot the value accordingly.

So the test becomes:
 "If VALUE > High5, and VALUE > High4 and VALUE > High3 and VALUE > High2 and VALUE > High1 then High1 = VALUE
   If Value > High5 and VALUE > High4 and VALUE > High3 and VALUE > High 2 and VALUE < High1 then High2  = VALUE
... this test eventually winds down to IF VALUE is less than all but High5 then VALUE = High5

Same litany of a test but this time running from High to Low
 " if VALUE  > High1 then High1 = VALUE
 " If VALUE < High1 and VALUE > High2 then High2 = VALUE
...this test eventually winds down to If VALUE < High1 and VALUE < High2 and VALUE < High3 and VALUE < High4 and VALUE > High5 and then High5 = VALUE

Now I realize there needs to be a lot more to the Test to deal with equal values and the possible shuffle of rankings that involves, and the sheer inefficiency in coding lines this type of a sorting involves but the question that comes to my mind is Does it matter, (in terms of speed and accuracy) with this type of sort routine if it runs high to low or low to high? In terms of accuracy, it seems ( seems the operative word here), that if I sort High to Low, initializing the High at zero helps, as does the sort low to high initializing the Low to say 100 if I know there is likely data with a value lower than that.

P.S. My goto for these kind of stupid questions is my Google box. Who informs me to put it in an array and use a quick sort. Other than that advice it has no idea what I'm talking about.



Title: Re: A Question on Sorting
Post by: Petr on September 17, 2021, 02:24:21 pm
Code: QB64: [Select]
  1.  
  2. 'generate 10.000 values and search 5 maximals
  3.  
  4. DIM values(9999) AS INTEGER
  5. DIM max(4) AS INTEGER
  6.  
  7. 'fill array with 10000 values
  8. FOR fill = 0 TO 9999
  9.     values(fill) = fill * 3
  10.  
  11. 'search 5 max
  12. FOR s = 0 TO 4
  13.     max(s) = SearchMax(values())
  14.  
  15. 'show outputs
  16. FOR s = 0 TO 4
  17.     PRINT max(s)
  18.  
  19.  
  20.  
  21.  
  22. FUNCTION SearchMax (values() AS INTEGER)
  23.     done = LBOUND(values)
  24.     DO UNTIL done = UBOUND(values)
  25.         IF SearchMax < values(done) THEN
  26.             SearchMax = values(done): index = done
  27.         END IF
  28.         done = done + 1
  29.     LOOP
  30.     values(index) = 0 'set maximal value to zero for next search
Title: Re: A Question on Sorting
Post by: bplus on September 17, 2021, 04:35:11 pm
Code: QB64: [Select]
  1. _Title "Top 5" 'b+ 2021-09-17
  2. Dim top(4)
  3. For i = 1 To 30 ' 30 randoms 0 to 99
  4.     r = Int(Rnd * 100) ' <<<<<<<<<<<<<<<<<< plug in data here
  5.     Print "New data:"; r: Print
  6.     If r > top(0) Then
  7.         For j = 3 To 0 Step -1
  8.             top(j + 1) = top(j)
  9.         Next
  10.         top(0) = r
  11.     ElseIf r > top(1) Then
  12.         For j = 3 To 1 Step -1
  13.             top(j + 1) = top(j)
  14.         Next
  15.         top(1) = r
  16.     ElseIf r > top(2) Then
  17.         For j = 3 To 2 Step -1
  18.             top(j + 1) = top(j)
  19.         Next
  20.         top(2) = r
  21.     ElseIf r > top(3) Then
  22.         top(4) = top(3)
  23.         top(3) = r
  24.     ElseIf r > top(4) Then
  25.         top(4) = r
  26.     End If
  27.  
  28.     For j = 0 To 4 ' progress
  29.         Print top(j)
  30.     Next
  31.     Print "zzz... press any"
  32.     Sleep
  33.     Cls
  34. Print "Top 5"
  35. For j = 0 To 4
  36.     Print top(j)
  37.  
  38.  
  39.  
Title: Re: A Question on Sorting
Post by: bplus on September 17, 2021, 04:44:53 pm
Now it's a Sub
Code: QB64: [Select]
  1. _Title "Top 5" 'b+ 2021-09-17
  2. Dim Shared top(4)
  3. Dim r(1 To 30)
  4. For i = 1 To 30 ' 30 randoms 0 to 99
  5.     r(i) = Int(Rnd * 100) ' <<<<<<<<<<<<<<<<<< plug in data here
  6. top5 r()
  7. Print "Top 5"
  8. For j = 0 To 4
  9.     Print top(j)
  10.  
  11. Sub top5 (arr())
  12.     For i = LBound(arr) To UBound(arr)
  13.         r = arr(i)
  14.         If r > top(0) Then
  15.             For j = 3 To 0 Step -1
  16.                 top(j + 1) = top(j)
  17.             Next
  18.             top(0) = r
  19.         ElseIf r > top(1) Then
  20.             For j = 3 To 1 Step -1
  21.                 top(j + 1) = top(j)
  22.             Next
  23.             top(1) = r
  24.         ElseIf r > top(2) Then
  25.             For j = 3 To 2 Step -1
  26.                 top(j + 1) = top(j)
  27.             Next
  28.             top(2) = r
  29.         ElseIf r > top(3) Then
  30.             top(4) = top(3)
  31.             top(3) = r
  32.         ElseIf r > top(4) Then
  33.             top(4) = r
  34.         End If
  35.     Next
  36.  
  37.  
Title: Re: A Question on Sorting
Post by: bplus on September 17, 2021, 05:17:21 pm
Generalized Top 5 to Top N, you say how many of the array you want off the top:
Code: QB64: [Select]
  1. _Title "Top N" 'b+ 2021-09-17  Generalize Top 5
  2. ReDim Shared top(0) '
  3. Dim r(1 To 30) ' random data array to test
  4. For i = 1 To 30 ' 30 randoms 0 to 99
  5.     r(i) = Int(Rnd * 100) ' <<<<<<<<<<<<<<<<<< plug in data here
  6. For N = 1 To 10
  7.     topN r(), N
  8.     Print "Top"; N
  9.     For j = 0 To N - 1
  10.         Print top(j)
  11.     Next
  12.     Print
  13.     Print "zzz... press any to contiue until Top 10"
  14.     Sleep
  15.     Cls
  16.  
  17. Sub topN (arr(), N)
  18.     ReDim top(N - 1)
  19.     start = N - 2
  20.     For i = LBound(arr) To UBound(arr)
  21.         r = arr(i)
  22.         fini = 0
  23.         again:
  24.         If r > top(fini) Then
  25.             For j = start To fini Step -1
  26.                 top(j + 1) = top(j)
  27.             Next
  28.             top(fini) = r
  29.             _Continue
  30.         End If
  31.         fini = fini + 1
  32.         If fini < N Then GoTo again
  33.     Next
  34.  
  35.  
Title: Re: A Question on Sorting
Post by: Dimster on September 18, 2021, 10:44:39 am
Thanks you guys. I gather there is no getting away from it, an array needs to be defined, rather than individual variables. And I suppose this implies that with an array it's immaterial if the values are calculated low to high or high to low?

I appreciate the input.
Title: Re: A Question on Sorting
Post by: SMcNeill on September 18, 2021, 11:26:44 am
Regardless of being in an array, or not, there’s not going to be any difference in sorting speeds from high to low, or low to high.

The basic, unfolded logic looks like this:

IF X1 < X2 THEN SWAP X1, X2
IF X1 < X3 THEN SWAP X1, X3
IF X1 < X4 THEN SWAP X1, X4
IF X1 < X5 THEN SWAP X1, X5
IF X2 < X3 THEN SWAP X2, X3
….and so on.

With the above, we have 5 variables (x1, x2, x3, x4, x5), and we swap to place them in lowest to highest order.  The first 4 statements make certain that X1 is the lowest.  The next 3 statements would make certain X2 is the lowest…. and so on.

If we wanted high to low instead, we’d do the EXACT same process except we’d change  < into >.  Same logic.  Same number of steps.  All exactly the same, except we’d change < to >.  There wouldn’t be a single difference in speed and performance between the two.



As an array, the same code would look like:
FOR i = 1 TO 5
    FOR j = i TO 5
        IF X(I) < X(J) THEN SWAP X(I), X(J)
    NEXT
NEXT

Simpler code, but the same basic process controlled by loops to sort from low to high.

And to sort from high to low, what’s needed?

You got it!  Just change that < to >, and that’s it!



Not a bit of difference in the performance between high-to-low and low-to-high one bit! 
Title: Re: A Question on Sorting
Post by: SMcNeill on September 18, 2021, 11:43:04 am
Or, in your specific case that you asked about:

IF Value < High1 THEN
    High5 = High4: High4 = High3: High3 = High2: High2 = High1: High1 = Value
ELSEIF Value < High2 THEN
    High5 = High4: High4 = High3: High3 = High2: High2 = Value
ELSEIF Value < High3 THEN
…. and so on.

The above would be for low to high input.  For high to low, you’d just swap that < to >.  In the end it’d still perform the same.
Title: Re: A Question on Sorting
Post by: TempodiBasic on September 18, 2021, 02:12:10 pm
Hi
my philosophical degression:

is faster from high to low or vice versa?  No one is faster than the other. 10miles are the same both running from left to right both running from right to left.

Your method , that has been showed in a fantastic way by All you QB64 master coders , resembles the bubble sort .
Thinking that at start time the array has all its elements = 0, and going on to put values in ordered mode from Highest to Lowest
I think that this bubble sort can become faster merging it with the binary search.

What do you think about this?
Title: Re: A Question on Sorting
Post by: bplus on September 18, 2021, 02:47:23 pm
I don't think the Bubble sort is good for any case except for ease of understanding of how it works.

Also Binary search requires a sorted list to work.
Title: Re: A Question on Sorting
Post by: SMcNeill on September 18, 2021, 05:34:39 pm
I don't think the Bubble sort is good for any case except for ease of understanding of how it works.

Also Binary search requires a sorted list to work.

Bubble sor….    **QUIVERS UNCONTROLLABLY**

BRRRR!!  I just can’t bring myself to even mention it, if we’re talking about usefulness! 

Still, if I’m completely honest, like most programmers who *hate* its performance, I’ll have to admit to implementing it with very limited datasets just for its simplicity.  I’ll stress here, for emphasis: VERY LIMITED DATASETS.

If I need to sort a deck of cards, then I might fall back onto a bubble sort.  If it’s a grocery list, a bubble sort will do.  If it’s my list of chores for the week, a bubble sort will do.  Usually, any dataset with less than 100 elements in it can be bubble sorted.

Beyond that, I either grab a Binary Insertion Sort, a Quick Sort, or a simple Comb Sort routine and make use of them — for the following reasons:

When the data allows, nothing is faster than a Binary Insertion Sort.  How does this work??

Start with an ordered list of ### items: apple, bat, cat, dog, frog, elephant…
Add a new item to the list: chicken
Do a binary search of your data to find the position where chicken goes: After 3rd word
Shift everything to the right one array element after that and insert the word in its proper place.

One shift of data in memory, coupled with a quick binary position search — it doesn’t get any faster or more efficent than this!  (But, it’s a limited use case sort method.)



Method 2: Quick Sort — works to become the fastest sort method with just about any generalized data set.  The only drawback here is if you recursively sort, I have seen issues with running out of stack space over the years.  Generally, I’ll apply this for moderate sized data sets (say less than 1,000,000,000 data items).



Method 3: Comb Sort —almost as fast as quick sort and still very simple to implement with usually less than 20 lines of code.  Works with all data sets, regardless of size, with no recursion or issues with stack limitations.  If I’m going to take the trouble to copy/paste an existing sort into a different program, *this* is the one I fall back on.  Comb Sort is also what MemSort is based upon, so if you’ve used it from the samples forums here, you’ve used a Comb Sort in your work before.  ;)
Title: Re: A Question on Sorting
Post by: MWheatley on September 18, 2021, 06:40:03 pm
I don't think the Bubble sort is good for any case except for ease of understanding of how it works.


Agreed.  But gosh, it's easy to use in real-world low-number scenarios.

Malcolm
Title: Re: A Question on Sorting
Post by: rickclark58 on September 18, 2021, 08:03:49 pm
For every sorting algorithm, there is a best-case and a worst-case. Generally, the average is a good rule of thumb to go by for general-purpose sorts. They all take a finite amount of time and what you use is largely dependant on the type and quantity of the data. You really need to fit the sort to the data for the best results. https://www.geeksforgeeks.org/time-complexities-of-all-sorting-algorithms/ (https://www.geeksforgeeks.org/time-complexities-of-all-sorting-algorithms/)
Title: Re: A Question on Sorting
Post by: Dimster on September 19, 2021, 08:43:11 am
Wow, thanks again, lots to consider here. That "SWAPPING" algorythm Steve is similar to where I was heading but was using Select Case instead of the "If' statements.
And Tempodi's analogy of the 10 miles Left to Right or Right to Left pretty well highlights my thinking on the difference between a High to Low sort v's a Low to High but in my case I'm thinking ... travelling Right to Left is easier for a right handed person and vice versa. And walking that 10 mile highway going forward of backward makes a major difference. Conceptually, I have a sequential file and want to pick out the data one item at a time. I want to handle that data just once ( one decision on where it fits in a hierarchy) and onto the end of the sequential file. I wanted to avoid creating arrays and doing a sort or multiple swaps with each piece of data (because I have a lot of sequential files)

I have always considered a sort in High to Low order (always looking for the best rather than the worst or walking that 10 km/mile highway forward) as faster than Low to High. Human bias I guess.

Thanks again.
Title: Re: A Question on Sorting
Post by: TempodiBasic on September 19, 2021, 11:18:22 am
HI

about bubblesort
yes it seems to be the oldest and the slowest method of sorting a set of data of the same kind and with a theoretically homogeneus distribution in the range of  possible values. But if you change the kind of distribution the results change,  each sorting method has its ideal condition and its quicksand. For example think of wonderful Quicksort working with this set of data (02134569), here Bubblesort wins easily. While working with this set of data (90654231) I think that QuickSort wins easily vs Bubblesort.


about Dimster question
In the situation described by Dimster, at start point we have no mixed data but all 0 values into the 5 elements' array.
Going on with the input of value, once at time, we must sorting them from the highest to the lowest....after at each new input we must ordering inserting the value in the right position in respect of rule from hightest to lowest.
in simulation:
sorting High      1  2  3  4  5           

at start point    0  0  0  0  0

first = 2            2  0  0  0  0
second = 8       8  2  0  0  0
third = 1           8  2  1  0  0
forth = 6           8  6  2  1  0
fifth = 3            8   6  3  2 1

it seems that Bubblesort idea is the fastest in this approach.

about phylosophy (jump if you're not interested)--------------------------------------------------------
The Protagora's rule teach us that High and Low are relative to the setting and to the metric that we are using. No absolute point of view.
just for an example: going from High to Low is equal to go down or to go worst ,  while going from Low to High is equal to go up or to go better. But just we western humans think High as good and Low as bad....but a lower interest of banch loan is good enough? on the other hand see Yin-Yang idea there is no "high" or "low" but "higher than" and "lower than"
----------------------------------------------------------------------------------------------------------------------------------------------

about sorting algorithm


and
SortDemo.BAS

https://github.com/gondur/dmg8bit/blob/master/SORTDEMO.BAS  to get file


Fine to exchange opinions and share resources
Thanks
Title: Re: A Question on Sorting
Post by: rickclark58 on September 19, 2021, 11:32:22 am
Just to add a bit more to the discussion in relation to sequential files. I am not saying this applies to your problem, but this problem has been around since the beginning. Back in the day, most data was stored as sequential files and as the number of records (an individual line in a sequential file) grew so did the time required to access them. Sequential files are linear so have a big O of O(n). The process was to load a record into memory, search it for the needed data, process it, and then load the next record into memory, and so on. The best case is to find the searched element on the first lookup and the worst case is to find the element on the last lookup, or not at all. The average would be in the middle. This doesn't sound bad unless you have millions of records to search through or very large sequential files. Since these files usually reside on disk, you are having to go through the IO process of the disk (or tape back in the day) so the processing time became quite significant. What was needed was a good way to look up the data in a manner that skipped most of the record on disk and could pull the needed information directly from the record on disk. They came up with indexes.

Indexes are sorted lists of the data in the record with the lookup item as a key and the location of that item on the disk. Since the list is sorted, you can do a binary search on the list. A binary search is O(log n) which is significantly faster than O(n). The difference is "In contrast to O(N) which takes an additional step for each data element, O(log N) means that the algorithm takes an additional step each time the data doubles." -Introduction to Big O Notation.

Most indexes are actually stored as tree structures and you just have to walk down the tree to the needed item if it exists. Trees are actually an implementation of a binary search in reverse so when you traverse a tree you are doing a binary search on the data. Indexes significantly improved data access and even today, every database creates indexes on the data for fast retrieval. If you have a lot of sequential files to go through, building indexes will significantly increase access time. However, there are many free open source databases that you can use via ODBC or dedicated libraries so there is no sense in reinventing the wheel nowadays. Using a local database is the way to go now if you need to process large data sets.
Title: Re: A Question on Sorting
Post by: Dimster on September 19, 2021, 08:25:31 pm
So Tempodi I kind of get where you are heading with the Yin-Yang, Higher than/Lower than and a low interest loan is an example where Low beats High. Reframing my concept to something like Best/Worst or Acceptable/Not Acceptable would be more to the point of agnostic computerize than a humanistic bias towards High being positive and Low negative.

Rick, not quite sure I follow the INDEXES ... are you saying the data values in the Sequential File are pre ordered and an Index is affixed to each sequential record? I guess I could sort each sequential file every time I write a data value to it.

Or is the point that the data values, in the sequential file, remain as originally entered but the Indices somehow identify the hierarchical order?

One of the things I am regretting with the way I have recorded all this damn data, is to keep an accompanying Stats File for each of the Sequential Files. I know now, what I didn't realize back then when I first started the program, that things like a Maximum Value or a Moving Average (etc, etc,etc) could have been recorded at the same time as data entry. As it is now, I need to open multiple files , search data, calculate and present/display results everytime I need to know whats Up with my program. I'm thinking that Stat File maybe akin to what you are describing with INDEXES ???
 
Title: Re: A Question on Sorting
Post by: rickclark58 on September 19, 2021, 09:00:57 pm
The indexes contain key values like an account number. When you save the record, the account number is inserted into the index tree in the proper order. So if the account number is 100 it would be inserted into the tree right after 99. Normally, the records are contained in a block, say 100 records per block. The location of the record in the block is recorded with the account number. So, when you need to load account 100 again, the program looks through the index and finds 100, and then retrieves the location of the record within the block. It would then seek to that location and load the record.

Typically records have a header and the start of the header is actually what is stored in the index. The header is a fixed length, so the program knows how much data to read. After reading the header, the file pointer would be sitting at the beginning of the record. The program would read the record header to find out how long the record is and then load the record based on the length in the header. That way you can have variable length records which will save space and makes it faster to retrieve data.

This is a very simplified explanation and this isn't something that you would code yourself. You could use SQL Lite for instance, and simply query the database for account number 100. Using a database, you can get 1 record or 100 with a simple query, which makes it much easier to manipulate this kind of data. You can do something like "Select * from customers where account_number = 100". If the record is there, it will return it to the program where you can access the data within the record. The star means to grab all the fields in the record. Customers would be a table that contains the data. You can also do things like join multiple tables and return the information as if it were one table. It is quite handy when you get the hang of it and very powerful.

It was quite straightforward to load a database with preexisting data too as there are built-in commands to do such a thing. It is actually quite common in many businesses.  When I worked for Ford Motor Finance, we would get a daily dump from the mainframe that we would load into the Oracle database every day. Once you set it up, it is an automatic process. I am not suggesting you use a database for your data, as I don't really know what you are doing with it. Managing datasets has always been a problem, which is why almost everyone uses a database if they are working with a lot of data. It is just easier.
Title: Re: A Question on Sorting
Post by: SMcNeill on September 19, 2021, 09:40:47 pm
Indexes are key for maintaining large data sets.  Let me give you an overly simplified example:

dog, cat, apple, frog, elephant, + 1,000,000 more items…. (Store this in an array of DIM Words(1,000,005) AS STRING)

Now, let’s say the above is your data set, and you need it sorted alphabetically.  Instead of moving the data itself, you simply create an index and use it.

3, 2, 1, 5, 4, + 1,000,000 more items…. (Store these values in an array of DIM Index(1,000,005) AS LONG)

Now, which is the first word in the list alphabetically?  It’s not Words(1) — that’s “dog”.

But what about Words(Index(1))?

Index(1) holds a value of 3, making that Words(3) — “apple”.  The first element of the index points to the first word alphabetically in the list!

And Words(Index(2)) is “cat”, and Words(Index(3)) is “dog”, and so on!

Your data never moves.  It stays in its original order at all times.  You’re just sorting an Index to the data, rather than rearranging or reordering the data itself — and that can be an extremely powerful database tool!

Think of a list of data with first name, last name, address, state, zip code, phone number….

Now, you could sort and move and reorder that data EVERY time you needed it listed in a certain order…. Your boss likes last names first.  Telemarketing likes sorted by phone numbers.  The mail men want it sorted by address…. An advertiser wants it sorted by zip code….

Now, since you’re maintaining a Fortune 500 database with 397,216,312 customers, are you going to want to take the time to reshuffle ALL that data every time you need it sorted by a different criteria??

NOT IF YOU WANT TO KEEP YOUR JOB!!

For companies like that, maintaining the database — without altering it and possibly corrupting it — is the Holy Grail!  You *don’t* erase a 300 million customer database.  You *don’t* overwrite it.  You *worship* it as your God and *ONLY* add new customers to the end of it!  The customer database can ONLY grow — NOTHING else!!

So then how do you sort it for all those different people and use cases?

By indexes!

DIM FirstName(1,000,000,000) AS LONG — Index sorted by first name.
DIM LastName(1,000,000,000) AS LONG — Index sorted by last name.

You want to know who the first customer is, sorted by first name?  Thats CustomerInfo(FirstName(1)).

Want to know who the first customer is, sorted by last name?  CustomerInfo(LastName(1)).

You reference the data by its position in the index, not by its position in the database itself.
Title: Re: A Question on Sorting
Post by: Dimster on September 20, 2021, 08:05:15 am
Huuuuum... In terms of creating an Index file for one existing Sequential File of say 4000 data values I would either have to figure out how to affix a record number to each data value or calculate the position of each data value within the Sequential file ??????
Title: Re: A Question on Sorting
Post by: SMcNeill on September 20, 2021, 12:00:28 pm
Huuuuum... In terms of creating an Index file for one existing Sequential File of say 4000 data values I would either have to figure out how to affix a record number to each data value or calculate the position of each data value within the Sequential file ??????

Here's a quick little example of an indexed file at work:

Code: QB64: [Select]
  1. Dim word(10) As String
  2. Dim index(10) As Long
  3. $Let SKIPENTRY = TRUE
  4.  
  5.  
  6. $If SKIPENTRY Then
  7.     For i = 1 To 10
  8.         For j = 1 To 5 'create a totally random 5 letter word
  9.             word(i) = word(i) + Chr$(Rnd * 26 + 65) 'these "words" are made up of 5 random letters
  10.         Next
  11.         index(i) = i 'point the initial index to the words as they currently exist
  12.     Next
  13.     For i = 1 To 10
  14.     Print "Give me 10 words to store for you (#"; i; "of 10) =>";
  15.     Input word(i) 'Get any 10 words which you might like from the user
  16.     index(i) = i 'point the initial index to the words as they currently exist
  17.     Next
  18.  
  19.  
  20. 'sort by index
  21. For i = 1 To 10
  22.     For j = 1 To 10
  23.         If word(index(i)) < word(index(j)) Then Swap index(i), index(j)
  24.     Next
  25.  
  26. Print , "Original", "Sorted"
  27. For i = 1 To 10
  28.     Print i, word(i), word(index(i))
  29.  

  [ This attachment cannot be displayed inline in 'Print Page' view ]    [ This attachment cannot be displayed inline in 'Print Page' view ]  

Now, as you can see from the above, I'm leaving my data in its original layout and order.  We're not touching it at all, and we'll always be able to reproduce it faithfully without worry in the future.  I've created a single index here (called Index oddly enough -- who would've thunk it?!), and instead of sorting by the array elements itself, I'm sorting the array by its indexed elements. 

To keep it simple and easy to understand, I've implemented the most basic of sorting routines  here -- a lousy, unoptimized bubble sort, but I'm only using 10 elements in my demo array so that shouldn't be any issue for performance.

If you want to manually enter 10 words yourself, rather than just let the program toss up a bunch of gibberish for our data, simply change that $LET statement to be FALSE and then run the program.  If you want to test with a larger dataset, just make the arrays large enough to hold your data and change the loops end point ot match.  400 elements or 4,000,000,000 elements -- the process to work with them would all more or less be the same.  (The main difference would be the need for a better sorting algorithm with a larger data set, but the concept behind indexing the data would remain the same.)

At this point, you could now save the data to disk (my_data.txt for example) and save the index to disk (my_data.ndx for example), and if you ever needed to search for something alphabetically, you'd just do a quick binary search based on word(index(position))).



It takes a little while for people to get used to the idea of referencing their data by array(index(position)) rather than simply by array(position), but once they've gotten used to the concept, they'll always tell you how powerful a tool it can be when the situation calls for it. 

Small datasets usually don't need an index.  You just sort them on the fly, by whatever criteria you need, and use them as is.  Massive data sets (such as a large company's customer database, or an index to all the books in a library) are almost always upkept with multiple indexes which stay sorted all the time. 

Indexing is definitely a nice skill/tool to learn to work with, just in case you ever find yourself in a situation where working with your data becomes too time consuming otherwise.  ;)
Title: Re: A Question on Sorting
Post by: Dimster on September 21, 2021, 03:49:12 pm
Thanks Steve - definitely going to sort out Indexing. I appreciate the info.
Title: Re: A Question on Sorting
Post by: TempodiBasic on September 22, 2021, 05:01:16 am
Hi
Thanks to Steve for his code demo, I hope he doesn't mind if I have enlarged (modded) his demo to stress the advantages of working with indexes. So I just  added more indexes to show how to manage the thing.
Code: QB64: [Select]
  1. Const Max = 15
  2. Dim word(Max) As String
  3. Dim As Long index(Max), indexZA(Max), indexLENmin(Max)
  4. $Let SKIPENTRY = TRUE
  5.  
  6.  
  7. $If SKIPENTRY Then
  8.     For i = 1 To Max
  9.         For j = 1 To (1 + Rnd * 9) 'create a totally random 1-10 letter word
  10.             word(i) = word(i) + Chr$(Rnd * 26 + 65) 'these "words" are made up of 5 random letters
  11.         Next
  12.         index(i) = i 'point the initial index to the words as they currently exist
  13.     Next
  14.     FOR i = 1 TO max
  15.     PRINT "Give me 10 words to store for you (#"; i; "of ";max;"10) =>";
  16.     INPUT word(i) 'Get any ";max;" words which you might like from the user
  17.     index(i) = i 'point the initial index to the words as they currently exist
  18.     NEXT
  19.  
  20. 'copy the intial indexes
  21. For a = 1 To Max
  22.     indexZA(a) = index(a)
  23.     indexLENmin(a) = index(a)
  24.  
  25. 'sort by index  A-Z
  26. For i = 1 To Max
  27.     For j = 1 To Max
  28.         If word(index(i)) < word(index(j)) Then Swap index(i), index(j)
  29.     Next
  30.  
  31. 'sort by index  Z-A
  32. For i = 1 To Max
  33.     For j = 1 To Max
  34.         If word(indexZA(i)) > word(indexZA(j)) Then Swap indexZA(i), indexZA(j)
  35.     Next
  36.  
  37. 'sort by index  Lenght of word   from short to longer
  38. For i = 1 To Max
  39.     For j = 1 To Max
  40.         If Len(word(indexLENmin(i))) > Len(word(indexLENmin(j))) Then Swap indexLENmin(i), indexLENmin(j)
  41.     Next
  42.  
  43. Print , "Original", "Sorted A-Z", "Sorted Z-A", "Sorted by Lenght"
  44. For i = 1 To Max
  45.     Print i, word(i), word(index(i)), word(indexZA(i)), word(indexLENmin(i))
  46.  
  47.  

and here the results
  [ This attachment cannot be displayed inline in 'Print Page' view ]  

Title: Re: A Question on Sorting
Post by: SMcNeill on September 22, 2021, 05:23:22 am
I don’t mind the modding at all, Tempodi.  😘

Only thing I want to point out is the fact that you don’t need a separate index to go from A-Z to Z-A sorting.  Just read your index in reverse….

Instead of:

    PRINT i, word(i), word(index(i)), word(indexZA(i)), word(indexLENmin(i))

Use:

    PRINT i, word(i), word(index(i)), word(index(Max - i + 1)), word(indexLENmin(i))


If your index is storing the order from A-Z, reading that index backwards will give you the same list from Z-A.



indexZA() is good to use just as an example, but in most real world use cases, you’d just use  the original index itself.

index(i) is the A to Z order from 1 to Max

index(Max - i + 1) is simply the inverse of that order, so it’s Z to A.


Title: Re: A Question on Sorting
Post by: TempodiBasic on September 24, 2021, 07:56:36 pm
Hi Steve
thanks for tips
it is important to have in mind to use less resources  to do the task so
I have corrected my homework mod with this:
Code: QB64: [Select]
  1. Const Max = 15
  2. Dim word(Max) As String
  3. Dim As Long index(Max), indexLENmin(Max)
  4. $Let SKIPENTRY = TRUE
  5.  
  6.  
  7. $If SKIPENTRY Then
  8.     For i = 1 To Max
  9.         For j = 1 To (1 + Rnd * 9) 'create a totally random 1-10 letter word
  10.             word(i) = word(i) + Chr$(Rnd * 26 + 65) 'these "words" are made up of 5 random letters
  11.         Next
  12.         index(i) = i 'point the initial index to the words as they currently exist
  13.     Next
  14.     FOR i = 1 TO max
  15.     PRINT "Give me 10 words to store for you (#"; i; "of ";max;"10) =>";
  16.     INPUT word(i) 'Get any ";max;" words which you might like from the user
  17.     index(i) = i 'point the initial index to the words as they currently exist
  18.     NEXT
  19.  
  20. 'copy the intial index
  21. For a = 1 To Max
  22.     indexLENmin(a) = index(a)
  23.  
  24. 'sort by index  A-Z
  25. For i = 1 To Max
  26.     For j = 1 To Max
  27.         If word(index(i)) < word(index(j)) Then Swap index(i), index(j)
  28.     Next
  29.  
  30. 'sort by index  Z-A
  31. ' no needed because it is the reverse of Z-A
  32.  
  33. 'sort by index  Lenght of word   from the shortest to longestest
  34. For i = 1 To Max
  35.     For j = 1 To Max
  36.         If Len(word(indexLENmin(i))) > Len(word(indexLENmin(j))) Then Swap indexLENmin(i), indexLENmin(j)
  37.     Next
  38.  
  39. ' sort by index length of word from the longest to te shortest
  40. ' no needed because it is the reverse of   from the shortest to the longest
  41.  
  42. Print "    "; "Original"; "   "; "Sorted A-Z"; "   "; "Sorted Z-A"; "   "; "by Shortest"; "   "; "by Longest"
  43. For i = 1 To Max
  44.     Locate , 1:
  45.     Print i;
  46.     Locate , 5: Print word(i);
  47.     Locate , 20: Print word(index(i));
  48.     Locate , 31: Print word(index(Max - i + 1));
  49.     Locate , 42: Print word(indexLENmin(i));
  50.     Locate , 58: Print word(indexLENmin(Max - i + 1))
  51.     Sleep 1
  52.  
 [ This attachment cannot be displayed inline in 'Print Page' view ]