Author Topic: Help with an array  (Read 10917 times)

0 Members and 1 Guest are viewing this topic.

Offline Larryrl

  • Newbie
  • Posts: 26
    • View Profile
Help with an array
« on: April 27, 2020, 12:48:20 am »
I have some arrays for a program as is shown in the code below.


Code: QB64: [Select]
  1. dim language$(200000,3),ldup$(200000),edup$(200000)
  2.  

Now I need a loop or something to search the array and find duplicate entries. I need it to check every index against every other index of the language$ array for column 1 and also for column 2. Then if it finds more than one entry for a word in column 1 it should be put into the next unused index in the ldup array and if it finds duplicates in column 2, it should be put into the next unused index in the edup array. This program works with constructed languages or conlangs. I used to do this in ms excel where they have a highlight cells function that colors in the duplicate values. I do not need  the color part, I just want the words that are duplicated to get the proper ldpup, or edup array depending on if it is a language word or an English word.

This is the only part of the program that is still not functional. To be honest, it is not even written in QB64. But, when I can't see a solution beyond all of the windows controls and such of some of the other dialets of basic for windows, I always come back to normal basic like QB64. If necessary I can modify it and get it to work in the language it is in once I kind of get some idea of how to begin.

Offline Pete

  • Forum Resident
  • Posts: 2361
  • Cuz I sez so, varmint!
    • View Profile
Re: Help with an array
« Reply #1 on: April 27, 2020, 05:15:02 am »
This is the same as post: https://www.qb64.org/forum/index.php?topic=2469.0

I think this would demo in QB64 what you are trying to achieve in your Liberty Basic routine.

Code: QB64: [Select]
  1. DIM language$(20, 3), ldup$(20), edup$(20)
  2. language$(1, 1) = "cat"
  3. language$(2, 1) = "dog"
  4. language$(3, 1) = "turtle"
  5. language$(4, 1) = "frog"
  6. language$(5, 1) = "bird"
  7. language$(1, 2) = "cat"
  8. language$(2, 2) = "rabbit"
  9. language$(3, 2) = "dog"
  10. language$(1, 3) = "frog"
  11. language$(2, 3) = "snake"
  12.  
  13. FOR i = 1 TO 20
  14.     FOR j = 1 TO 3
  15.         IF language$(i, j) <> "" THEN
  16.             concat$ = concat$ + language$(i, j) + "|"
  17.         END IF
  18.     NEXT j
  19. ' PRINT concat$
  20. seed = 1
  21.     x$ = MID$(concat$, seed, INSTR(seed, concat$, "|") - seed)
  22.     ' PRINT x$
  23.     oldseed = seed
  24.     seed = INSTR(seed, concat$, "|") + 1
  25.     IF INSTR(MID$(concat$, seed), x$ + "|") THEN
  26.         IF INSTR(MID$(concat$, 1, oldseed), x$ + "|") = 0 THEN
  27.             ' Duplicate word found.
  28.             cnt1 = cnt1 + 1: ldup$(cnt1) = x$
  29.             dupconcat$ = dupconcat$ + x$ + "|"
  30.         END IF
  31.     ELSE
  32.         IF INSTR(dupconcat$, x$) = 0 THEN
  33.             ' Unique word found.
  34.             cnt2 = cnt2 + 1: edup$(cnt2) = x$
  35.         END IF
  36.     END IF
  37. LOOP UNTIL seed > LEN(concat$)
  38.  
  39. ' Okay, let's see the results.
  40. PRINT "[Duplicate Words]"
  41. FOR i = 1 TO cnt1
  42.     PRINT ldup$(i)
  43. PRINT "[Unique Words]"
  44. FOR i = 1 TO cnt2
  45.     PRINT edup$(i)

Want to learn how to write code on cave walls? https://www.tapatalk.com/groups/qbasic/qbasic-f1/

Offline Larryrl

  • Newbie
  • Posts: 26
    • View Profile
Re: Help with an array
« Reply #2 on: April 27, 2020, 01:20:59 pm »

Checking for duplicate words problem solved!

Code: QB64: [Select]
  1. lindex=0:eindex=0
  2. i=0
  3.  
  4. while i<tw+1
  5. i=i+1
  6. for j=1 to tw
  7. if i<>j and language$(i,1)=language$(j,1) then lindex=lindex+1:ldup$(lindex)=language$(j,1)
  8. if i<>j and language$(i,3)=language$(j,3) then eindex=eindex+1:edup$(eindex)=language$(j,1)
  9.  
  10.  


Offline Pete

  • Forum Resident
  • Posts: 2361
  • Cuz I sez so, varmint!
    • View Profile
Re: Help with an array
« Reply #3 on: April 27, 2020, 02:43:05 pm »
I'd be interested to see  how long that takes, if you have the 200,000 elements as per your DIM statement indicates. Using two loops will require 200,000 x 200,000 or 40,000,000,000 passes to complete. That's why I prefer using INSTR; however, if it's just a one-time sort procedure, yep, I've had those too. You just let the app run until the job gets done. How much faster would INSTR be for something of this magnitude? I have no idea. It would be interesting to test out.

I did notice your post does not filter out placing duplicates into the new array multiple times. In other words, if cat occurred 30 times in the language array, it would be place 30 times in the dup array, instead of just once, to indicate it is present more than once in the language array. If that doesn't work for you as intended, you would need to loop through each of the dup arrays, and only add the duplicate entry if it wasn't already present.Using looping, that would greatly magnify the number of passes needed to complete the task. Again, using INSTR, even in that limited capacity, would speed up that filtering process.

Pete
Want to learn how to write code on cave walls? https://www.tapatalk.com/groups/qbasic/qbasic-f1/

Offline TempodiBasic

  • Forum Resident
  • Posts: 1792
    • View Profile
Re: Help with an array
« Reply #4 on: April 27, 2020, 05:14:08 pm »
@Pete

yes, math is not an opinion
Quote
Using two loops will require 200,000 x 200,000 or 40,000,000,000 passes to complete

but there is another way to get the result without this search?

@Larryln
but why do you miss
Code: QB64: [Select]
  1. language$(i,2)=language$(j,2)
???
Programming isn't difficult, only it's  consuming time and coffee

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: Help with an array
« Reply #5 on: April 27, 2020, 06:00:14 pm »
j doesn't have to start at 1, it should start at i + 1

Offline Pete

  • Forum Resident
  • Posts: 2361
  • Cuz I sez so, varmint!
    • View Profile
Re: Help with an array
« Reply #6 on: April 27, 2020, 06:51:17 pm »
j doesn't have to start at 1, it should start at i + 1

Good point Mark, provided the user isn't counting the number of times a word is duplicated, only that it is a duplication. Using your i + i logic, the filter is always advancing forward, without needless repetition, again, provided the results are not dependent on the number of occurrences.

Want to learn how to write code on cave walls? https://www.tapatalk.com/groups/qbasic/qbasic-f1/

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: Help with an array
« Reply #7 on: April 27, 2020, 08:35:42 pm »
You know if the j word has already been ID'd as a duplicate...

Oh the array should be sorted first! That would shorten the task considerable!

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: Help with an array
« Reply #8 on: April 27, 2020, 08:43:36 pm »
Wouldn't the quick way be to sort the arrays first?  Then it's a case of minimal comparisons -- like so:

LIST 1         LIST 2
Apple           Apple
Bat              Apple
Cat              Bat
Frog            Cat

To start with, you compare list1(0) to list 2(0).   Apple matches Apple.
Then compare to list2(1).  Apple still matches the second Apple.
Then compare to list2(2).  Apple does NOT match Bat.  No need to compare further.

Now, compare list1(1) to list2(2).  Bat matches Bat.
Then compare to list2(3).  Bat doesn't match Cat.  No need to compare further.


Very minimal condition checking is needed once the arrays are sorted.
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline STxAxTIC

  • Library Staff
  • Forum Resident
  • Posts: 1091
  • he lives
    • View Profile
Re: Help with an array
« Reply #9 on: April 27, 2020, 08:50:48 pm »
If you are starting with raw text files with simple formatting, the unix toolkit (grep sed awk etc) can sort and remove duplicates in an instant.
You're not done when it works, you're done when it's right.

Offline Pete

  • Forum Resident
  • Posts: 2361
  • Cuz I sez so, varmint!
    • View Profile
Re: Help with an array
« Reply #10 on: April 27, 2020, 09:04:43 pm »
Get your own animals, Steve. Stop using mine!

Pete :D
Want to learn how to write code on cave walls? https://www.tapatalk.com/groups/qbasic/qbasic-f1/

Offline codeguy

  • Forum Regular
  • Posts: 174
    • View Profile
Re: Help with an array
« Reply #11 on: April 28, 2020, 12:51:33 am »
I would highly recommend sorting at LEAST one entry list and this way you can use binary search which will cut your accesses to find duplicates by a considerable margin. At most, you'd search 20 times/entry versus an average of 100,000 per entry. You'd cut the required operations from 100,000*200,000 to 20*200,000. A vast improvement. Perhaps my posted sorting library could come in quite handy for this task. 20,000,000,000 versus 4,000,000 operations is a marked improvement. 5,0000 times faster if my mental math isn't off. At 20,0000,000 seek/compare operations, you could age considerably under the proposed solution to search every element in a dimension of the array against elements in other parts of the same array. This would be like seeking the common words of numerous texts for similar words. A thankless task if you try solving it by brute force. The binary search is a powerful weapon.
« Last Edit: April 28, 2020, 12:57:49 am by codeguy »

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: Help with an array
« Reply #12 on: April 28, 2020, 01:13:43 am »
Once sorted all the dups would be stacked one upon the other, nothing to search, just count up until the next item appears start another count for it...

Offline Dimster

  • Forum Resident
  • Posts: 500
    • View Profile
Re: Help with an array
« Reply #13 on: April 28, 2020, 11:16:38 am »
On trying to find duplicates, I'm not sure how to do that with strings but I have had to figure out how many times a particular value repeats in a given set of data values. Here is the approach I've used with success. Just basically add the your data results as a decimal value to the counter. For example.
Code: QB64: [Select]
  1. Dim Main ( 1 to 20)
  2. Dim Duplicate (1 to 20) ...Rem data 5,20,19,5,18,17 etc
  3. For x = 1 to 20
  4. TargetNum = Main(x)
  5.    For y = 1 to 20
  6.       if Main(y) = TargetNum then Dcount = Dcount  + 1
  7.    Next y
  8.    Duplicate(x) = Dcount + (Main(x) * .01))

The results in the Duplicate are then 1.5, 1.20, 1.19, 2.5, 1.18, 1.17 etc
Any value greater than or equal to 2 is a duplicate

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: Help with an array
« Reply #14 on: April 28, 2020, 01:24:14 pm »
Inspired by Qwerkey's B-Day problem https://www.qb64.org/forum/index.php?topic=2514.0

Here is finding duplicate B-days for 100 people assigned a random day:
Code: QB64: [Select]
  1. _TITLE "Duplicate B-Days"
  2.  
  3. DIM SHARED sa$(0 TO 100)
  4. FOR b = 1 TO 100
  5.     sa$(b) = RIGHT$(SPACE$(3) + STR$(INT(RND * (365)) + 1), 3)
  6. QSort 1, 100
  7. PRINT "Duplicates B-Days for 100 people:"
  8. FOR i = 1 TO 100
  9.     count = 0
  10.     PRINT i; ":";
  11.     IF i + 1 > 100 THEN PRINT _TRIM$(STR$(0)): EXIT FOR
  12.     FOR j = i + 1 TO 100
  13.         IF sa$(i) = sa$(j) THEN
  14.             count = count + 1
  15.         ELSE
  16.             PRINT _TRIM$(STR$(count)),
  17.             tDups = tDups + count
  18.             EXIT FOR
  19.         END IF
  20.     NEXT
  21.     i = i + count '<<<<<<<<<<<<<<< skip over what has been counted
  22. PRINT: PRINT "Total Dups:"; tDups
  23.  
  24.  
  25. SUB QSort (Start, Finish) 'sa$ needs to be SHARED  array
  26.     DIM i AS INTEGER, j AS INTEGER, x$, a$
  27.     i = Start
  28.     j = Finish
  29.     x$ = sa$(INT((i + j) / 2))
  30.     WHILE i <= j
  31.         WHILE sa$(i) < x$
  32.             i = i + 1
  33.         WEND
  34.         WHILE sa$(j) > x$
  35.             j = j - 1
  36.         WEND
  37.         IF i <= j THEN
  38.             a$ = sa$(i)
  39.             sa$(i) = sa$(j)
  40.             sa$(j) = a$
  41.             i = i + 1
  42.             j = j - 1
  43.         END IF
  44.     WEND
  45.     IF j > Start THEN QSort Start, j
  46.     IF i < Finish THEN QSort i, Finish
  47.  
  48.  

If the total is 12 that would be like 12%.