Author Topic: Fun with n-Grams  (Read 2531 times)

0 Members and 1 Guest are viewing this topic.

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Fun with n-Grams
« on: April 20, 2019, 11:03:39 pm »
Code: QB64: [Select]
  1. _TITLE "Ngrams 1" 'started 2019-04-20 B+
  2. ' ref: Daniel Shiffman Coding Challenge #42.1 Markov Chains Part 1
  3. ' [youtube]https://www.youtube.com/watch?v=eGFJ8vugIWA&list=PLRqwX-V7Uu6ZiZxtDDRCi6uhfTH4FilpH[/youtube]&index=52
  4.  
  5. ' Basically take some text, collect all unique nGrams of a certain length, the gram.
  6. ' Collect all the letters that follow a gram in the text in nextChar string.
  7. ' Create a new random text with same nGrams based on random selection of following chars.
  8.  
  9. SCREEN _NEWIMAGE(800, 720, 32)
  10.  
  11. 'setup the grams
  12. CONST ngLen = 6
  13. TYPE nGramType
  14.     gram AS STRING * NGLEN
  15.     nextChar AS STRING
  16. REDIM SHARED grams(1 TO 1000) AS nGramType, nextGI
  17. nextGI = 1
  18.  
  19. 'get some text, this might be fun
  20. OPEN "Pete replies.txt" FOR INPUT AS #1
  21.     LINE INPUT #1, t$
  22.     txt$ = txt$ + t$
  23. 'PRINT txt$
  24.  
  25. 'Load the grams array
  26. FOR i = 1 TO LEN(txt$) - ngLen - 1
  27.     ngCandidate$ = MID$(txt$, i, ngLen)
  28.     'PRINT "ngCandidate$ "; ngCandidate$
  29.     found = find(ngCandidate$)
  30.     'PRINT "found "; found
  31.     IF found = 0 THEN
  32.         'add to grams
  33.         grams(nextGI).gram = ngCandidate$
  34.         grams(nextGI).nextChar = MID$(txt$, i + ngLen, 1)
  35.         'sort grams
  36.         QSort 1, nextGI
  37.         'FOR i = 1 TO nextGI
  38.         '    PRINT grams(i).gram, grams(i).nextChar
  39.         'NEXT
  40.         nextGI = nextGI + 1
  41.         IF nextGI > UBOUND(grams) THEN REDIM _PRESERVE grams(1 TO UBOUND(grams) + 1000) AS nGramType
  42.     ELSE
  43.         grams(found).nextChar = grams(found).nextChar + "~" + MID$(txt$, i + ngLen, 1)
  44.     END IF
  45.     'INPUT "OK press enter..."; wate$
  46. 'check load
  47. 'FOR i = 1 TO nextGI
  48. '    PRINT grams(i).gram, grams(i).nextChar
  49. '    IF i MOD 20 = 0 THEN
  50. '        PRINT: INPUT "OK, press enter...", wate$
  51. '        CLS
  52. '    END IF
  53. 'NEXT
  54.  
  55. 'generate random text
  56. 'find a gram that starts with a Capital letter, dang it's always the same start!
  57. i = INT(RND * (nextGI - 1)) + 1
  58. WHILE rtxt$ = ""
  59.     IF INSTR("ABCDEFGHIJKLMNOPQRSTUVWXYZ", LEFT$(grams(i).gram, 1)) > 0 THEN
  60.         rtxt$ = grams(i).gram
  61.         IF LEN(grams(i).nextChar) = 1 THEN
  62.             rtxt$ = rtxt$ + grams(i).nextChar
  63.         ELSE
  64.             REDIM select$(0)
  65.             Split1000 grams(i).nextChar, "~", select$()
  66.             r = INT(RND * (UBOUND(select$) + 1))
  67.             rtxt$ = rtxt$ + select$(r)
  68.         END IF
  69.     END IF
  70.     i = i + 1
  71.     IF i > nextGI - 1 THEN i = 10
  72.  
  73. 'PRINT rtxt$, LEN(rtxt$)
  74. WHILE LEN(rtxt$) < 4000
  75.     g$ = MID$(rtxt$, LEN(rtxt$) - ngLen + 1)
  76.     found = find(g$)
  77.     IF found > 0 THEN
  78.         IF LEN(grams(found).nextChar) = 1 THEN
  79.             rtxt$ = rtxt$ + grams(found).nextChar
  80.         ELSE
  81.             REDIM select$(0)
  82.             Split1000 grams(found).nextChar, "~", select$()
  83.             r = INT(RND * (UBOUND(select$) + 1))
  84.             rtxt$ = rtxt$ + select$(r)
  85.         END IF
  86.     ELSE
  87.         rtxt$ = rtxt$ + " "
  88.     END IF
  89. PRINT rtxt$
  90.  
  91. SUB QSort (Start, Finish) 'shownK needs to be shared array
  92.     i = Start
  93.     j = Finish
  94.     m$ = grams(INT((i + j) / 2)).gram
  95.     WHILE i <= j
  96.         WHILE grams(i).gram < m$
  97.             i = i + 1
  98.         WEND
  99.         WHILE grams(j).gram > m$
  100.             j = j - 1
  101.         WEND
  102.         IF i <= j THEN
  103.             SWAP grams(i), grams(j)
  104.             i = i + 1
  105.             j = j - 1
  106.         END IF
  107.     WEND
  108.     IF j > Start THEN QSort Start, j
  109.     IF i < Finish THEN QSort i, Finish
  110.  
  111. FUNCTION find (x$)
  112.     IF nextGI = 1 THEN EXIT FUNCTION
  113.     low = 1: hi = nextGI - 1
  114.     WHILE low <= hi
  115.         test = (low + hi) \ 2
  116.         IF grams(test).gram = x$ THEN
  117.             find = test: EXIT FUNCTION
  118.         ELSE
  119.             IF grams(test).gram < x$ THEN low = test + 1 ELSE hi = test - 1
  120.         END IF
  121.     WEND
  122.  
  123. 'notes: REDIM the array(0) to be loaded before calling Split '<<<<<<<<<<<<<<<<<<<<<<< IMPORTANT!!!!
  124. SUB Split1000 (mystr AS STRING, delim AS STRING, arr() AS STRING)
  125.     ' bplus modifications of Galleon fix of Bulrush Split reply #13
  126.     ' http://xmaxw.[abandoned, outdated and now likely malicious qb64 dot net website - don’t go there]/forum/index.php?topic=1612.0
  127.     ' this sub further developed and tested here: \test\Strings\Split test.bas
  128.     DIM copy AS STRING, p AS LONG, curpos AS LONG, arrpos AS LONG, dpos AS LONG
  129.  
  130.     copy = mystr 'make copy since we are messing with mystr when the delimiter is a space
  131.  
  132.     'special case if delim is space, probably want to remove all excess space
  133.     IF delim = " " THEN
  134.         copy = RTRIM$(LTRIM$(copy))
  135.         p = INSTR(copy, "  ")
  136.         WHILE p > 0
  137.             copy = MID$(copy, 1, p - 1) + MID$(copy, p + 1)
  138.             p = INSTR(copy, "  ")
  139.         WEND
  140.     END IF
  141.     curpos = 1
  142.     arrpos = 0
  143.     LD = LEN(delim) 'mod
  144.     dpos = INSTR(curpos, copy, delim)
  145.     DO UNTIL dpos = 0
  146.         arr(arrpos) = MID$(copy, curpos, dpos - curpos)
  147.         arrpos = arrpos + 1
  148.         IF arrpos > UBOUND(arr) THEN REDIM _PRESERVE arr(UBOUND(arr) + 1000) AS STRING
  149.         curpos = dpos + LD
  150.         dpos = INSTR(curpos, copy, delim)
  151.     LOOP
  152.     arr(arrpos) = MID$(copy, curpos)
  153.     REDIM _PRESERVE arr(arrpos) AS STRING 'need this line? YES to get the ubound correct
  154.  
  155.  

"Pete replies.txt" file, copy and paste into file or find your own text to play with.
Quote
I'm just glad I can count on The Rho to fill in for me when I'm too busy not being a misogynist to reply to a post. Men and women are different biologically and psychologically, period. Women don't gravitate to programming like men do Probably math has something to do with that, but it probably goes deeper. It's just fun to point out the obvious math part. I ran into this same question when I coached golf. How can we get more girls interested in the game? Well, it's simple, change the game. Girls wanted it to be more a social outing. Guys, on the other hand, play real golf! Kidding aside, guys like playing more to compete. They socialize, too, but to a lesser degree. It's more about shot-making than. I mean you could probably pick 10 similarities men and women golfers have in common, but they would rank radically different in importance. If any girls or women want to post or visit the forums, great, but be advised it is male dominate, period. That's not going to change unless more females get interested in coding.  Frankly, if enough females wee coding back in the QBasic age, I would have put one in charge of a ladies sub-forum. I wouldn't exclude males from posting, but I would want to see it develop in a way that was somehow different than the predominately, nearly all male forum. It's a bit of a tangent, but all this parity BS of today is not for the sake of equality, it's so liberals can suck votes out of simpletons to keep their liberal communist dream for power alive and moving forward. It's a lot less less work than coming up with actual beneficial ideas for a free society, because then there would be less pathetic losers looking for handouts. In the U.S., already more than 50% of the workforce is female; so now Bolsheviks want to try and make people believe there is income inequality. Well duh. If most of the workforce was male before, there is this little thing called seniority that means males, who have been at the job longer, are being paid higher. When that changes, either something else will replace it like women aren't getting enough respect, or some other BS "inequality" topic will be all the fake news talks about 24/7. Anyway, my gut tells me Qwerkey is just asking about ways to get more females interested in coding, for non-political reasons. Men and women are different, and it's nice to have a larger number of your gender to relate to on social platforms. There are other good reasons, too, such as the practical aspects of knowing how to program, including good jobs. What's cool about places today that are free capitalistic governments is that all of these opportunities are available, but if psychological makeup (not made by the Covergirl corporation) has anything to say about it, you'd probably have to get pretty inventive with the Woman's club concept I touched on, earlier, to make it happen. Well then so much for the fake news bozos, who would love women to think they should get involved with programming. My take on it is even if more women did, they'd be too high maintenance for most companies, requiring way too much FORTRAN before any real coding got done.

PS for creative word creation try setting
Code: QB64: [Select]
  1. CONST ngLen = 6

the gram length to a shorter 3 or 4.
« Last Edit: April 21, 2019, 08:36:23 am by bplus »