Author Topic: Text Corrector  (Read 4774 times)

0 Members and 1 Guest are viewing this topic.

Offline PoliMi

  • Newbie
  • Posts: 2
  • Sample & Hold
    • View Profile
Text Corrector
« on: October 12, 2021, 04:24:29 pm »
Text Corrector corrects writing errors while the user is composing a text. It doesn't require a vocabulary file, instead it learns any language from a given sample text. The algorithm not only extracts the words from that sample, but it also extracts the links between words and analyzes the probabilistic features of the language. Basing on the collected data, typing errors are then modeled as gaussian distributions which mean is the right value.

While the user is typing a new word, in order to predict which word the user intends to write, the algorithm evaluates vocabulary words affinity, which is a weighted mean of three parameters:
- LENGTH(i): the probabilistic distance between the length of the typed word and the length of the i-th vocabulary word
- CONTEXT(i): the estimated frequency that, given one last typed word, the i-th vocabulary word follows
- STRUCTURE(i, j): the probabilistic distance of the j-th letter of the typed word from the j-th position in the i-th vocabulary word

To enhance prediction performance, the algorithm recalibrates these parameters with two further correction mechanisms:
- BIAS: the weight of the LENGTH parameter is reduced while few letters have been typed, since with few letters typed it's probable that length is not definitive (i.e., that more letters will be typed, and length will soon increase)
- BONUS: if the exact typed word exists in the vocabulary, its affinity is brought to the maximum, so that it prevails on all other vocabulary words.

Even though in order to fully work the algorithm needs a long and complete text sample that is representative of the language, the default sample hard-coded in the program can already show a good example of its predictive behavior.

Code: QB64: [Select]
  1. _Title "Text Corrector"
  2. Common Shared L: L = 100000 + 1 ' max number of links + 1
  3. Dim Shared numWords As Integer
  4. Dim Shared numLinks As Integer
  5. Dim Shared numCountedLinks As Integer
  6. Dim Shared numSearchedCountedLinks As Integer
  7. Dim Shared totSearchedCountedLinks As Integer
  8. Dim Shared links(L, 2) As String
  9. Dim Shared countedLinks(L, 3) As String
  10. Dim Shared searchedCountedLinks(L, 3) As String
  11. Dim Shared words(L) As String
  12. Dim Shared wordsProbability(L, 4) As Double ' 1) LENGTH   2) CONTEXT   3) STRUCTURE   4) GLOBAL
  13. Dim Shared GaussTable(0 To 400) As Double: fillGaussTable
  14.  
  15. If _FileExists("sample.txt") Then
  16.     Open "sample.txt" For Input As #1
  17.     Do Until EOF(1)
  18.         Line Input #1, temp$
  19.         sample$ = sample$ + temp$ + " "
  20.     Loop
  21.     Close #1
  22.     sample$ = "Today is a beautiful day, tomorrow I'm going to the park. The park is far from home, so today I will stay at home or maybe I will go to the supermarket. By the way, your house is beautiful and big, I really like it!"
  23.  
  24. lastArchivedWord$ = "."
  25. isFirstTime = -1
  26. lastInputWasPunctuation = 0
  27. learnFromSample (sample$)
  28.     If Not key$ = "" Or isFirstTime Then
  29.         isFirstTime = 0
  30.         bias1 = 0.10 / (2 ^ (1 * Len(newWord$))) ' with few letters context >> length
  31.         bias2 = 0.05 / (2 ^ (1 * Len(newWord$))) ' with few letters structure >> length
  32.         computeWordAffinity lastArchivedWord$, newWord$, 0.15 - bias1 - bias2, 0.30 + bias1, 0.55 + bias2, 1
  33.         refreshTextEditor archivedText$, lastArchivedWord$, newWord$
  34.     End If
  35.  
  36.     key$ = LCase$(InKey$): _Limit 20
  37.     If Not key$ = "" Then
  38.         If isPunctuation(key$) Then
  39.             archivedText$ = archivedText$ + " " + lastArchivedWord$ + " "
  40.             If newWord$ = "" Then
  41.                 archivedText$ = archivedText$ + " "
  42.             Else
  43.                 archivedText$ = archivedText$ + " " + highestAffinityWord$ + " "
  44.                 newWord$ = ""
  45.             End If
  46.             lastArchivedWord$ = key$
  47.             lastInputWasPunctuation = -1
  48.         Else
  49.             Select Case key$
  50.                 Case " ":
  51.                     If Not lastInputWasPunctuation Then
  52.                         archivedText$ = archivedText$ + " " + lastArchivedWord$ + " "
  53.                         If isPunctuation(newWord$) Then
  54.                             lastArchivedWord$ = newWord$
  55.                         Else
  56.                             lastArchivedWord$ = highestAffinityWord$
  57.                         End If
  58.                         newWord$ = ""
  59.                     Else
  60.                         lastInputWasPunctuation = 0
  61.                         key$ = ""
  62.                     End If
  63.                 Case Chr$(8):
  64.                     If Not Len(newWord$) = 0 Then
  65.                         newWord$ = Left$(newWord$, Len(newWord$) - 1)
  66.                     Else
  67.                         key$ = ""
  68.                     End If
  69.                 Case Chr$(13): key$ = ""
  70.                 Case Chr$(27): System
  71.                 Case Else:
  72.                     newWord$ = newWord$ + key$
  73.                     lastInputWasPunctuation = 0
  74.             End Select
  75.         End If
  76.     End If
  77.  
  78.  
  79.  
  80.  
  81. Sub refreshTextEditor (archivedText$, lastArchivedWord$, newWord$)
  82.     showComputedWordAffinity
  83.     toprint$ = archivedText$
  84.     If Not isPunctuation(lastArchivedWord$) Then toprint$ = toprint$ + " "
  85.     toprint$ = toprint$ + lastArchivedWord$
  86.     If Not isPunctuation(newWord$) Then toprint$ = toprint$ + " "
  87.     toprint$ = toprint$ + newWord$ + "_"
  88.     showText destandardize$(Right$(toprint$, Len(toprint$) - 2), -1), Len(newWord$), -1
  89.     _Display
  90.  
  91. Sub showText (text$, numOfFinalCharsToColor, colorFinalChar)
  92.     Do While Len(text$) > 80 * 10
  93.         text$ = Right$(text$, Len(text$) - 80)
  94.     Loop
  95.     Color 7: Locate 1, 1
  96.     Print "Write something, I'll try to correct you:"
  97.     Color 15: Locate 2, 1
  98.     line$ = ""
  99.     For I = 1 To Len(text$)
  100.         line$ = line$ + Mid$(text$, I, 1)
  101.         If I = Len(text$) - numOfFinalCharsToColor Then Color 13
  102.         If colorFinalChar And I = Len(text$) Then Color 7
  103.         Print Mid$(text$, I, 1);
  104.         If Len(line$) >= 80 Or I = Len(text$) Then line$ = "": Print
  105.     Next I
  106.  
  107. Sub showComputedWordAffinity
  108.     best$ = highestAffinityWord$
  109.     Cls: Locate 13, 1: Color 8
  110.     Print "WORD", "LENGTH", "CONTEXT", "STRUCTURE", "GLOBAL"
  111.     For i = 1 To min(numWords, 10)
  112.         If words(i) = best$ Then Color 15
  113.         If Not isSpecial(words(i)) Then
  114.             If Len(words(i)) > 12 Then
  115.                 Print Left$(words(i), 11); " ",
  116.             Else
  117.                 Print words(i); " ",
  118.             End If
  119.             Print Int(wordsProbability(i, 1) * 100); "%   ", Int(wordsProbability(i, 2) * 100); "%   ", Int(wordsProbability(i, 3) * 100); "%   ", max(Int(wordsProbability(i, 4) * 100), 0); "%"
  120.         End If
  121.         If words(i) = best$ Then Color 8
  122.     Next i
  123.  
  124. Function highestAffinityWord$
  125.     For I = 1 To numWords
  126.         If Not isSpecial(words(I)) Then
  127.             highestAffinityWord$ = words(I)
  128.             Exit For
  129.         End If
  130.     Next I
  131.  
  132. Sub learnFromSample (sample$)
  133.     decomposeIntoLinks (sample$)
  134.     orderLinks
  135.     countLinks
  136.     collectWords
  137.  
  138. Sub computeWordAffinity (oldword$, newWord$, A, B, C, BONUS) ' A = length weight, B = context weight, C = structure weight, BONUS = value added to global probability (constraint: A + B + C = 1)
  139.     For I = 1 To numWords
  140.         wordsProbability(I, 1) = 0
  141.         wordsProbability(I, 2) = 0
  142.         wordsProbability(I, 3) = 0
  143.         wordsProbability(I, 1) = probability(Len(newWord$), Len(words(I))) / probability(0, 0)
  144.         wordsProbability(I, 3) = structureLikeness(newWord$, words(I))
  145.     Next I
  146.     searchLinks (oldword$)
  147.     For I = 1 To numSearchedCountedLinks
  148.         position = wordPosition(searchedCountedLinks(I, 2))
  149.         wordsProbability(position, 2) = Val(searchedCountedLinks(I, 3)) / totSearchedCountedLinks
  150.     Next I
  151.     For I = 1 To numWords
  152.         wordsProbability(I, 4) = Int((A * wordsProbability(I, 1) + B * wordsProbability(I, 2) + C * wordsProbability(I, 3)) * 10 ^ 4) / 10 ^ 4
  153.         If wordsProbability(I, 1) = 1 And wordsProbability(I, 3) = 1 Then wordsProbability(I, 4) = min(1, wordsProbability(I, 4) + BONUS)
  154.     Next I
  155.     orderComputedWords
  156.     shuffleWords
  157.  
  158. Sub orderComputedWords
  159.     Do
  160.         changed = 0
  161.         For I = 2 To numWords
  162.             If wordsProbability(I, 4) > wordsProbability(I - 1, 4) Then
  163.                 buffer$ = words(I - 1)
  164.                 buffer1 = wordsProbability(I - 1, 1)
  165.                 buffer2 = wordsProbability(I - 1, 2)
  166.                 buffer3 = wordsProbability(I - 1, 3)
  167.                 buffer4 = wordsProbability(I - 1, 4)
  168.                 words(I - 1) = words(I)
  169.                 wordsProbability(I - 1, 1) = wordsProbability(I, 1)
  170.                 wordsProbability(I - 1, 2) = wordsProbability(I, 2)
  171.                 wordsProbability(I - 1, 3) = wordsProbability(I, 3)
  172.                 wordsProbability(I - 1, 4) = wordsProbability(I, 4)
  173.                 words(I) = buffer$
  174.                 wordsProbability(I, 1) = buffer1
  175.                 wordsProbability(I, 2) = buffer2
  176.                 wordsProbability(I, 3) = buffer3
  177.                 wordsProbability(I, 4) = buffer4
  178.                 changed = 1
  179.             End If
  180.         Next I
  181.     Loop Until changed = 0
  182.  
  183. Sub shuffleWords
  184.     For I = 2 To numWords
  185.         If wordsProbability(I, 4) = wordsProbability(I - 1, 4) Then
  186.             buffer$ = words(I - 1)
  187.             buffer1 = wordsProbability(I - 1, 1)
  188.             buffer2 = wordsProbability(I - 1, 2)
  189.             buffer3 = wordsProbability(I - 1, 3)
  190.             buffer4 = wordsProbability(I - 1, 4)
  191.             words(I - 1) = words(I)
  192.             wordsProbability(I - 1, 1) = wordsProbability(I, 1)
  193.             wordsProbability(I - 1, 2) = wordsProbability(I, 2)
  194.             wordsProbability(I - 1, 3) = wordsProbability(I, 3)
  195.             wordsProbability(I - 1, 4) = wordsProbability(I, 4)
  196.             words(I) = buffer$
  197.             wordsProbability(I, 1) = buffer1
  198.             wordsProbability(I, 2) = buffer2
  199.             wordsProbability(I, 3) = buffer3
  200.             wordsProbability(I, 4) = buffer4
  201.         End If
  202.     Next I
  203.  
  204. Function wordPosition (searchedWord$)
  205.     For I = 1 To numWords
  206.         If words(I) = searchedWord$ Then
  207.             wordPosition = I
  208.             Exit For
  209.         End If
  210.     Next I
  211.  
  212. Sub collectWords
  213.     oldstr$ = countedLinks(1, 1)
  214.     j = 1
  215.     For i = 2 To numCountedLinks
  216.         If Not countedLinks(i, 1) = oldstr$ Then
  217.             If Not isSpecial(oldstr$) Then
  218.                 words(j) = oldstr$
  219.                 j = j + 1
  220.             End If
  221.             oldstr$ = countedLinks(i, 1)
  222.         End If
  223.     Next i
  224.     If Not isSpecial(oldstr$) Then
  225.         words(j) = oldstr$
  226.         numWords = j
  227.     Else
  228.         numWords = j - 1
  229.     End If
  230.  
  231. Function structureLikeness (myWord$, vocabularyWord$)
  232.     Dim totDistance As Double: totDistance = 0
  233.     For I = 1 To Len(myWord$)
  234.         distance = charDistance(Mid$(myWord$, I, 1), vocabularyWord$, I)
  235.         totDistance = totDistance + probability(distance, 0)
  236.     Next I
  237.     If Len(myWord$) = 0 Then
  238.         structureLikeness = 0
  239.     Else
  240.         structureLikeness = totDistance / (Len(myWord$) * probability(0, 0))
  241.     End If
  242.  
  243. Sub searchLinks (keyword$)
  244.     For I = 1 To numCountedLinks
  245.         searchedCountedLinks(I, 1) = ""
  246.         searchedCountedLinks(I, 2) = ""
  247.         searchedCountedLinks(I, 3) = ""
  248.     Next I
  249.     j = 0
  250.     totSearchedCountedLinks = 0
  251.     For I = 1 To numCountedLinks
  252.         If countedLinks(I, 1) = keyword$ Then
  253.             j = j + 1
  254.             searchedCountedLinks(j, 1) = countedLinks(I, 1)
  255.             searchedCountedLinks(j, 2) = countedLinks(I, 2)
  256.             searchedCountedLinks(j, 3) = countedLinks(I, 3)
  257.             totSearchedCountedLinks = totSearchedCountedLinks + Val(countedLinks(I, 3))
  258.         End If
  259.     Next I
  260.     numSearchedCountedLinks = j
  261.  
  262. Sub countLinks
  263.     oldstr1$ = links(1, 1)
  264.     oldstr2$ = links(1, 2)
  265.     counter = 1
  266.     j = 1
  267.     For I = 2 To numLinks
  268.         If links(I, 1) = oldstr1$ And links(I, 2) = oldstr2$ Then
  269.             counter = counter + 1
  270.         Else
  271.             countedLinks(j, 1) = oldstr1$
  272.             countedLinks(j, 2) = oldstr2$
  273.             countedLinks(j, 3) = Str$(counter)
  274.             oldstr1$ = links(I, 1)
  275.             oldstr2$ = links(I, 2)
  276.             j = j + 1
  277.             counter = 1
  278.         End If
  279.     Next I
  280.     countedLinks(j, 1) = oldstr1$
  281.     countedLinks(j, 2) = oldstr2$
  282.     countedLinks(j, 3) = Str$(counter)
  283.     numCountedLinks = j
  284.  
  285. Sub decomposeIntoLinks (sample$)
  286.     sample$ = standardize$(sample$)
  287.     Dim oldword As String
  288.     Dim newword As String
  289.     Dim extractedword As String
  290.     For I = 1 To Len(sample$)
  291.         j = InStr(I, sample$, " ")
  292.         If j = 0 Then Exit For
  293.         extractedword = Mid$(sample$, I - 1, j - I + 1)
  294.         If Len(extracted) > 0 Then
  295.             oldword = newword
  296.             newword = extractedword
  297.             If Not I = 1 Then
  298.                 row = row + 1
  299.                 links(row, 1) = _Trim$(oldword)
  300.                 links(row, 2) = _Trim$(newword)
  301.             End If
  302.         End If
  303.         I = j + 1
  304.     Next I
  305.     numLinks = row
  306.  
  307. Sub orderLinks
  308.     Do
  309.         changed = 0
  310.         For I = 2 To numLinks
  311.             result = compare(links(I - 1, 1) + Chr$(255) + links(I - 1, 2), links(I, 1) + Chr$(255) + links(I, 2))
  312.             If result = 2 Then
  313.                 buffer1$ = links(I - 1, 1)
  314.                 buffer2$ = links(I - 1, 2)
  315.                 links(I - 1, 1) = links(I, 1)
  316.                 links(I - 1, 2) = links(I, 2)
  317.                 links(I, 1) = buffer1$
  318.                 links(I, 2) = buffer2$
  319.                 changed = 1
  320.             End If
  321.         Next I
  322.     Loop Until changed = 0
  323.  
  324. Function standardize$ (sample$)
  325.     sample$ = "." + _Trim$(LCase$(sample$)) + ". "
  326.     For i = 1 To Len(sample$)
  327.         char$ = Mid$(sample$, i, 1)
  328.         If isUnsupported(char$) Then char$ = " "
  329.         If isPunctuation(char$) Then char$ = " " + char$ + " "
  330.         standardize$ = standardize$ + char$
  331.     Next i
  332.  
  333. Function destandardize$ (sample$, uppercase)
  334.     If uppercase Then
  335.         sample$ = UCase$(Left$(_Trim$(sample$), 1)) + Right$(_Trim$(sample$), Len(_Trim$(sample$)) - 1)
  336.     Else
  337.         sample$ = _Trim$(sample$)
  338.     End If
  339.     For i = 1 To Len(sample$)
  340.         char$ = Mid$(sample$, i, 1)
  341.         nextchar$ = Mid$(sample$, i + 1, 1)
  342.         If char$ = " " And isPunctuation(nextchar$) Then
  343.         ElseIf char$ = " " And nextchar$ = " " Then
  344.         Else
  345.             destandardize$ = destandardize$ + char$
  346.         End If
  347.         punctuation = isPunctuation(char$)
  348.         If punctuation = -1 Then
  349.             destandardize$ = destandardize$ + " " + destandardize$(Right$(sample$, Len(sample$) - i - 1), -1)
  350.             Exit For
  351.         ElseIf punctuation = -2 Then
  352.             destandardize$ = destandardize$ + " " + destandardize$(Right$(sample$, Len(sample$) - i - 1), 0)
  353.             Exit For
  354.         End If
  355.     Next i
  356.  
  357. Function probability (value, mean) ' with standard deviation = 2
  358.     probability = cumulativeProbability(value + 0.5, mean, 2) - cumulativeProbability(value - 0.5, mean, 2)
  359.  
  360. Function cumulativeProbability (value, mean, deviation)
  361.     value = (value - mean) / deviation
  362.     adaptedvalue = Abs(Int(value * 100))
  363.     If adaptedvalue <= 400 Then
  364.         cumulativeProbability = GaussTable(adaptedvalue)
  365.     Else
  366.         cumulativeProbability = 1
  367.     End If
  368.     If value < 0 Then cumulativeProbability = 1 - cumulativeProbability
  369.  
  370. Sub fillGaussTable
  371.     GaussTable(0) = 0.5000
  372.     GaussTable(5) = 0.5199
  373.     GaussTable(10) = 0.5398
  374.     GaussTable(15) = 0.5596
  375.     GaussTable(20) = 0.5793
  376.     GaussTable(25) = 0.5987
  377.     GaussTable(30) = 0.6179
  378.     GaussTable(35) = 0.6368
  379.     GaussTable(40) = 0.6554
  380.     GaussTable(45) = 0.6736
  381.     GaussTable(50) = 0.6915
  382.     GaussTable(55) = 0.7088
  383.     GaussTable(60) = 0.7257
  384.     GaussTable(65) = 0.7421
  385.     GaussTable(70) = 0.7580
  386.     GaussTable(75) = 0.7734
  387.     GaussTable(80) = 0.7881
  388.     GaussTable(85) = 0.8023
  389.     GaussTable(90) = 0.8159
  390.     GaussTable(95) = 0.8289
  391.     GaussTable(100) = 0.8413
  392.     GaussTable(105) = 0.8531
  393.     GaussTable(110) = 0.8643
  394.     GaussTable(115) = 0.8749
  395.     GaussTable(120) = 0.8849
  396.     GaussTable(125) = 0.8944
  397.     GaussTable(130) = 0.9032
  398.     GaussTable(135) = 0.9115
  399.     GaussTable(140) = 0.9192
  400.     GaussTable(145) = 0.9265
  401.     GaussTable(150) = 0.9332
  402.     GaussTable(155) = 0.9394
  403.     GaussTable(160) = 0.9452
  404.     GaussTable(165) = 0.9505
  405.     GaussTable(170) = 0.9554
  406.     GaussTable(175) = 0.9599
  407.     GaussTable(180) = 0.9641
  408.     GaussTable(185) = 0.9678
  409.     GaussTable(190) = 0.9713
  410.     GaussTable(195) = 0.9744
  411.     GaussTable(200) = 0.9772
  412.     GaussTable(210) = 0.9821
  413.     GaussTable(220) = 0.9861
  414.     GaussTable(230) = 0.9893
  415.     GaussTable(240) = 0.9918
  416.     GaussTable(250) = 0.9938
  417.     GaussTable(260) = 0.9953
  418.     GaussTable(270) = 0.9965
  419.     GaussTable(280) = 0.9974
  420.     GaussTable(290) = 0.9981
  421.     GaussTable(310) = 0.9990
  422.     GaussTable(390) = 1
  423.     GaussTable(400) = 1
  424.     Dim lastValidValue As Double
  425.     Dim lastValidValuePosition As Integer
  426.     Dim nextValidValue As Double
  427.     Dim nextValidValuePosition As Integer
  428.     Dim interpolation As Double
  429.     For i = 0 To 399
  430.         If GaussTable(i) = 0 Then
  431.             nextValidValue = 0
  432.             j = i + 1
  433.             While nextValidValue = 0
  434.                 If Not GaussTable(j) = 0 Then
  435.                     nextValidValue = GaussTable(j)
  436.                     nextValidValuePosition = j
  437.                 Else
  438.                     j = j + 1
  439.                 End If
  440.             Wend
  441.             interpolation = Int((nextValidValue - lastValidValue) * (i - lastValidValuePosition) / (nextValidValuePosition - lastValidValuePosition) * 10 ^ 4) / 10 ^ 4
  442.             GaussTable(i) = lastValidValue + interpolation
  443.         Else
  444.             lastValidValue = GaussTable(i)
  445.             lastValidValuePosition = i
  446.         End If
  447.     Next i
  448.  
  449. Function compare (str1$, str2$) ' returns 0 if str1$ = str2$, 1 if str1$ < str2$, 2 if str1$ > str2$
  450.     If Len(str1$) < Len(str2$) Then compare = 1
  451.     If Len(str1$) > Len(str2$) Then compare = 2
  452.     For I = 1 To min(Len(str1$), Len(str2$))
  453.         If Asc(Mid$(str1$, I, 1)) > Asc(Mid$(str2$, I, 1)) Then
  454.             compare = 2
  455.             Exit For
  456.         End If
  457.         If Asc(Mid$(str1$, I, 1)) < Asc(Mid$(str2$, I, 1)) Then
  458.             compare = 1
  459.             Exit For
  460.         End If
  461.     Next I
  462.  
  463. Function charDistance (char$, word$, from)
  464.     charDistance = 10
  465.     For I = 0 To min(max(from - 1, Len(word$) - from), 10)
  466.         If from - I > 0 Then
  467.             If Mid$(word$, from - I, 1) = char$ Then
  468.                 charDistance = I
  469.                 Exit For
  470.             End If
  471.         End If
  472.         If from + I <= Len(word$) Then
  473.             If Mid$(word$, from + I, 1) = char$ Then
  474.                 charDistance = I
  475.                 Exit For
  476.             End If
  477.         End If
  478.     Next I
  479.  
  480. Function isSpecial (word$)
  481.     isSpecial = 0
  482.     If isPunctuation(word$) Or word$ = " " Or word$ = "" Then isSpecial = -1
  483.  
  484. Function isPunctuation (char$) ' returns -1 if next word is uppercase, otherwise returns -2
  485.     isPunctuation = 0
  486.     If char$ = "." Or char$ = "!" Or char$ = "?" Then isPunctuation = -1
  487.     If char$ = "," Or char$ = ":" Or char$ = ";" Then isPunctuation = -2
  488.  
  489. Function isUnsupported (char$)
  490.     isUnsupported = 0
  491.     If char$ = "“" Or char$ = "”" Or char$ = Chr$(13) Or char$ = Chr$(0) Or char$ = "-" Or char$ = Chr$(34) Or char$ = "/" Or char$ = "(" Or char$ = ")" Or char$ = "^" Or char$ = "[" Or char$ = "]" Or char$ = "{" Or char$ = "}" Or char$ = "_" Or char$ = "<" Or char$ = ">" Then isUnsupported = -1
  492.  
  493. Function min (int1, int2)
  494.     If int1 < int2 Then
  495.         min = int1
  496.     Else
  497.         min = int2
  498.     End If
  499.  
  500. Function max (int1, int2)
  501.     If int1 > int2 Then
  502.         max = int1
  503.     Else
  504.         max = int2
  505.     End If

Offline Pete

  • Forum Resident
  • Posts: 2361
  • Cuz I sez so, varmint!
    • View Profile
Re: Text Corrector
« Reply #1 on: October 12, 2021, 06:03:45 pm »
I'm not sure what to think of this. I mean given a long enough vocabulary list, maybe; but the sample gets very little correct and when you try to load large text files, it either cannot handle strings in excess of 10,000 characters or it just takes too long to process. With a text file with a string length of 5308 it took 10-seconds to process and got about 60% correct. For instance...

The days are getting shorter and the nights are getting longer.

was "corrected" to...

The days are getting more and the lights are getting more.

8 for 11. The words nights, longer, and shorter were not in the vocab list. So, not bad, but how did it miss nights? It was in the vocab list.

Next...

i'm a Yankee Doodle Dandy got "corrected" to

i'm a leaker people hands.

2 for 5 and it did not capitalize the i in i'm. You guessed it,  Yankee, Doodle, and Dandy were not in the vocab list.

Anyway...

The post reminded me of working with the Levenshtein distance string metric in my word processor to add spell check. The thing is, I had to include an extensive hash file to get reasonable results. To actually use this to get your computer to learn any language is not going to happen without a much faster way to load and analyze a word list.

Pete
Want to learn how to write code on cave walls? https://www.tapatalk.com/groups/qbasic/qbasic-f1/

Offline Sanmayce

  • Newbie
  • Posts: 63
  • Where is that English Text Sidekick?
    • View Profile
    • Sanmayce's home
Re: Text Corrector
« Reply #2 on: October 12, 2021, 08:47:05 pm »
Hi PoliMi,
it's nice to see another text auto-completion fan here.
Please consider making something like a benchmark of yours in order to show what are the speeds, memory requirements and so on.

As far as I see, the ultimate task is to make a phrase-checker (being not a mere spell-checker dealing with 1-grams i.e. single words) analyzing 2-grams, 3-grams and 4-grams. In next year, will reupdate my old phrase-checking package and consolidate the code with my Masakari tool.

Few months back, I played with a small English article (80 lines long, 28KB), the package is attached for all who want to play with it. It could serve as a staring point for examples/tests. My textual madness knows no limits, the end game is to have corpora of already ripped unigrams, bigrams, trigrams, tetragrams, pentagrams and to use them as a reference phrase-check file) against a given file, not (as in the example below) the file itself to be used as a self-checking reference.

The console dump below shows that in less than 1 minute (not that bad)
3:35:40.04
3:34:44.20
we obtain the occurrences of all 1-grams up to 5-grams housing 'take':

 
postQB.png


Code: QB64: [Select]
  1. E:\z\JRJ>dir
  2.  Volume in drive E is Sanmayce_111GB
  3.  Volume Serial Number is 1410-10F9
  4.  
  5.  Directory of E:\z\JRJ
  6.  
  7. 10/13/2021  03:34 AM    <DIR>          .
  8. 10/13/2021  03:34 AM    <DIR>          ..
  9. 04/22/2021  05:36 AM    <DIR>          JRJ_essay
  10. 04/22/2021  05:37 AM           109,582 Kazahana_Hexadecad_GCC_102_32bit.exe
  11. 04/22/2021  05:37 AM         2,494,334 Kazahana_sources_binaries.zip
  12. 04/22/2021  05:37 AM           135,680 Leprechaun_x-leton_32bit_Intel_01_001p.exe
  13. 04/22/2021  05:37 AM           134,144 Leprechaun_x-leton_32bit_Intel_01_008p.exe
  14. 04/22/2021  05:37 AM           134,144 Leprechaun_x-leton_32bit_Intel_01_512p.exe
  15. 04/22/2021  05:37 AM           139,264 Leprechaun_x-leton_32bit_Intel_02_001p.exe
  16. 04/22/2021  05:37 AM           137,728 Leprechaun_x-leton_32bit_Intel_02_008p.exe
  17. 04/22/2021  05:37 AM           137,728 Leprechaun_x-leton_32bit_Intel_02_512p.exe
  18. 04/22/2021  05:37 AM           139,264 Leprechaun_x-leton_32bit_Intel_03_001p.exe
  19. 04/22/2021  05:37 AM           137,728 Leprechaun_x-leton_32bit_Intel_03_008p.exe
  20. 04/22/2021  05:37 AM           137,728 Leprechaun_x-leton_32bit_Intel_03_512p.exe
  21. 04/22/2021  05:37 AM           138,240 Leprechaun_x-leton_32bit_Intel_04_001p.exe
  22. 04/22/2021  05:37 AM           136,704 Leprechaun_x-leton_32bit_Intel_04_008p.exe
  23. 04/22/2021  05:37 AM           136,704 Leprechaun_x-leton_32bit_Intel_04_512p.exe
  24. 04/22/2021  05:37 AM           139,776 Leprechaun_x-leton_32bit_Intel_05_001p.exe
  25. 04/22/2021  05:37 AM           138,240 Leprechaun_x-leton_32bit_Intel_05_008p.exe
  26. 04/22/2021  05:37 AM           138,240 Leprechaun_x-leton_32bit_Intel_05_512p.exe
  27. 04/22/2021  05:37 AM           695,925 Leprechaun_x-leton_r17tag.7z
  28. 04/22/2021  05:37 AM            96,993 LineJustify_PAGODAo5.c
  29. 04/22/2021  05:37 AM            73,728 LineJustify_PAGODAo5.exe
  30. 04/22/2021  05:37 AM            94,162 LineWordreporter.c
  31. 04/22/2021  05:37 AM            77,312 LineWordreporter.exe
  32. 04/22/2021  05:37 AM             1,633 MokujIN JADE 217 prompt.lnk
  33. 04/22/2021  05:37 AM            28,393 The_Task_of_a_Philosopher.txt
  34. 04/22/2021  05:37 AM             3,335 XGRAM_PAGODA5.bat
  35. 04/22/2021  05:37 AM             2,181 XGRAM_RIP_file.bat
  36. 04/22/2021  05:37 AM            35,015 Yoshi.exe
  37. 04/22/2021  05:37 AM           969,892 Yoshi7-.zip
  38.              28 File(s)      6,743,797 bytes
  39.               3 Dir(s)   3,712,290,816 bytes free
  40.  
  41. E:\z\JRJ>time
  42. The current time is:  3:34:44.20
  43. Enter the new time:
  44.  
  45. E:\z\JRJ>XGRAM_RIP_file.bat The_Task_of_a_Philosopher.txt
  46.  
  47. E:\z\JRJ>set TMP=d:\
  48.  
  49. E:\z\JRJ>set TEMP=d:\
  50.  
  51. E:\z\JRJ>Leprechaun_x-leton_32bit_Intel_01_001p.exe The_Task_of_a_Philosopher.txt.lst The_Task_of_a_Philosopher.txt.01 1123123 Y
  52. Leprechaun_singleton (Fast-In-Future Greedy n-gram-Ripper), rev. 17, written by Svalqyatchx.
  53. Purpose: Rips all distinct 1-grams (1-word phrases) with length 1..31 chars from incoming texts.
  54. Feature1: All words within x-lets/n-grams are in range 1..31 chars inclusive.
  55. Feature2: In this revision 512MB 1-way hash is used which results in 67,108,864 external B-Trees of order 3.
  56. Feature3: In this revision, 1 pass is to be made.
  57. Feature4: If the external memory has latency 99+microseconds then !(look no further), IOPS(seek-time) rules.
  58. Pass #1 of 1:
  59. Size of input file with files for Leprechauning: 31
  60. Allocating HASH memory 536,870,977 bytes ... OK
  61. Allocating memory 1097MB ... OK
  62. Size of Input TEXTual file: 28,393
  63. /; 00,004,391P/s; Phrase count: 4,391 of them 1,313 distinct; Done: 64/64
  64. Bytes per second performance: 28,393B/s
  65. Phrases per second performance: 4,391P/s
  66. Time for putting phrases into trees: 1 second(s)
  67. Flushing UNsorted phrases: 100%; Shaking trees performance: 00,002,626P/s
  68. Time for shaking phrases from trees: 1 second(s)
  69. Leprechaun: Current pass done.
  70.  
  71. Total memory needed for one pass: 124KB
  72. Total distinct phrases: 1,313
  73. Total time: 1 second(s)
  74. Total performance: 4,391P/s i.e. phrases per second
  75. Leprechaun: Done.
  76.  
  77. E:\z\JRJ>Leprechaun_x-leton_32bit_Intel_02_001p.exe The_Task_of_a_Philosopher.txt.lst The_Task_of_a_Philosopher.txt.02 1123123 Y
  78. Leprechaun_doubleton (Fast-In-Future Greedy n-gram-Ripper), rev. 17, written by Svalqyatchx.
  79. Purpose: Rips all distinct 2-grams (2-word phrases) with length 5..41 chars from incoming texts.
  80. Feature1: All words within x-lets/n-grams are in range 1..31 chars inclusive.
  81. Feature2: In this revision 512MB 1-way hash is used which results in 67,108,864 external B-Trees of order 3.
  82. Feature3: In this revision, 1 pass is to be made.
  83. Feature4: If the external memory has latency 99+microseconds then !(look no further), IOPS(seek-time) rules.
  84. Pass #1 of 1:
  85. Size of input file with files for Leprechauning: 31
  86. Allocating HASH memory 536,870,977 bytes ... OK
  87. Allocating memory 1097MB ... OK
  88. Size of Input TEXTual file: 28,393
  89. -; 00,003,929P/s; Phrase count: 3,929 of them 3,133 distinct; Done: 64/64
  90. Bytes per second performance: 28,393B/s
  91. Phrases per second performance: 3,929P/s
  92. Time for putting phrases into trees: 1 second(s)
  93. Flushing UNsorted phrases: 100%; Shaking trees performance: 00,006,266P/s
  94. Time for shaking phrases from trees: 1 second(s)
  95. Leprechaun: Current pass done.
  96.  
  97. Total memory needed for one pass: 355KB
  98. Total distinct phrases: 3,133
  99. Total time: 1 second(s)
  100. Total performance: 3,929P/s i.e. phrases per second
  101. Leprechaun: Done.
  102.  
  103. E:\z\JRJ>Leprechaun_x-leton_32bit_Intel_03_001p.exe The_Task_of_a_Philosopher.txt.lst The_Task_of_a_Philosopher.txt.03 1123123 Y
  104. Leprechaun_tripleton (Fast-In-Future Greedy n-gram-Ripper), rev. 17, written by Svalqyatchx.
  105. Purpose: Rips all distinct 3-grams (3-word phrases) with length 9..41 chars from incoming texts.
  106. Feature1: All words within x-lets/n-grams are in range 1..31 chars inclusive.
  107. Feature2: In this revision 512MB 1-way hash is used which results in 67,108,864 external B-Trees of order 3.
  108. Feature3: In this revision, 1 pass is to be made.
  109. Feature4: If the external memory has latency 99+microseconds then !(look no further), IOPS(seek-time) rules.
  110. Pass #1 of 1:
  111. Size of input file with files for Leprechauning: 31
  112. Allocating HASH memory 536,870,977 bytes ... OK
  113. Allocating memory 1097MB ... OK
  114. Size of Input TEXTual file: 28,393
  115. \; 00,003,574P/s; Phrase count: 3,574 of them 3,430 distinct; Done: 64/64
  116. Bytes per second performance: 28,393B/s
  117. Phrases per second performance: 3,574P/s
  118. Time for putting phrases into trees: 1 second(s)
  119. Flushing UNsorted phrases: 100%; Shaking trees performance: 00,006,860P/s
  120. Time for shaking phrases from trees: 1 second(s)
  121. Leprechaun: Current pass done.
  122.  
  123. Total memory needed for one pass: 389KB
  124. Total distinct phrases: 3,430
  125. Total time: 1 second(s)
  126. Total performance: 3,574P/s i.e. phrases per second
  127. Leprechaun: Done.
  128.  
  129. E:\z\JRJ>Leprechaun_x-leton_32bit_Intel_04_001p.exe The_Task_of_a_Philosopher.txt.lst The_Task_of_a_Philosopher.txt.04 1123123 Y
  130. Leprechaun_quadrupleton (Fast-In-Future Greedy n-gram-Ripper), rev. 17, written by Svalqyatchx.
  131. Purpose: Rips all distinct 4-grams (4-word phrases) with length 13..51 chars from incoming texts.
  132. Feature1: All words within x-lets/n-grams are in range 1..31 chars inclusive.
  133. Feature2: In this revision 512MB 1-way hash is used which results in 67,108,864 external B-Trees of order 3.
  134. Feature3: In this revision, 1 pass is to be made.
  135. Feature4: If the external memory has latency 99+microseconds then !(look no further), IOPS(seek-time) rules.
  136. Pass #1 of 1:
  137. Size of input file with files for Leprechauning: 31
  138. Allocating HASH memory 536,870,977 bytes ... OK
  139. Allocating memory 1097MB ... OK
  140. Size of Input TEXTual file: 28,393
  141. |; 00,003,238P/s; Phrase count: 3,238 of them 3,205 distinct; Done: 64/64
  142. Bytes per second performance: 28,393B/s
  143. Phrases per second performance: 3,238P/s
  144. Time for putting phrases into trees: 1 second(s)
  145. Flushing UNsorted phrases: 100%; Shaking trees performance: 00,006,410P/s
  146. Time for shaking phrases from trees: 1 second(s)
  147. Leprechaun: Current pass done.
  148.  
  149. Total memory needed for one pass: 426KB
  150. Total distinct phrases: 3,205
  151. Total time: 1 second(s)
  152. Total performance: 3,238P/s i.e. phrases per second
  153. Leprechaun: Done.
  154.  
  155. E:\z\JRJ>Leprechaun_x-leton_32bit_Intel_05_001p.exe The_Task_of_a_Philosopher.txt.lst The_Task_of_a_Philosopher.txt.05 1123123 Y
  156. Leprechaun_quintupleton (Fast-In-Future Greedy n-gram-Ripper), rev. 17, written by Svalqyatchx.
  157. Purpose: Rips all distinct 5-grams (5-word phrases) with length 17..61 chars from incoming texts.
  158. Feature1: All words within x-lets/n-grams are in range 1..31 chars inclusive.
  159. Feature2: In this revision 512MB 1-way hash is used which results in 67,108,864 external B-Trees of order 3.
  160. Feature3: In this revision, 1 pass is to be made.
  161. Feature4: If the external memory has latency 99+microseconds then !(look no further), IOPS(seek-time) rules.
  162. Pass #1 of 1:
  163. Size of input file with files for Leprechauning: 31
  164. Allocating HASH memory 536,870,977 bytes ... OK
  165. Allocating memory 1097MB ... OK
  166. Size of Input TEXTual file: 28,393
  167. /; 00,002,913P/s; Phrase count: 2,913 of them 2,905 distinct; Done: 64/64
  168. Bytes per second performance: 28,393B/s
  169. Phrases per second performance: 2,913P/s
  170. Time for putting phrases into trees: 1 second(s)
  171. Flushing UNsorted phrases: 100%; Shaking trees performance: 00,005,810P/s
  172. Time for shaking phrases from trees: 1 second(s)
  173. Leprechaun: Current pass done.
  174.  
  175. Total memory needed for one pass: 443KB
  176. Total distinct phrases: 2,905
  177. Total time: 1 second(s)
  178. Total performance: 2,913P/s i.e. phrases per second
  179. Leprechaun: Done.
  180.  
  181. E:\z\JRJ>sort /+10 /M 1012012 The_Task_of_a_Philosopher.txt.01 /O The_Task_of_a_Philosopher.txt.01.txt
  182.  
  183. E:\z\JRJ>sort /+10 /M 1012012 The_Task_of_a_Philosopher.txt.02 /O The_Task_of_a_Philosopher.txt.02.txt
  184.  
  185. E:\z\JRJ>sort /+10 /M 1012012 The_Task_of_a_Philosopher.txt.03 /O The_Task_of_a_Philosopher.txt.03.txt
  186.  
  187. E:\z\JRJ>sort /+10 /M 1012012 The_Task_of_a_Philosopher.txt.04 /O The_Task_of_a_Philosopher.txt.04.txt
  188.  
  189. E:\z\JRJ>sort /+10 /M 1012012 The_Task_of_a_Philosopher.txt.05 /O The_Task_of_a_Philosopher.txt.05.txt
  190.  
  191.  Volume in drive E is Sanmayce_111GB
  192.  Volume Serial Number is 1410-10F9
  193.  
  194.  Directory of E:\z\JRJ
  195.  
  196. 10/13/2021  03:34 AM            25,290 The_Task_of_a_Philosopher.txt.01.txt
  197. 10/13/2021  03:34 AM            74,892 The_Task_of_a_Philosopher.txt.02.txt
  198. 10/13/2021  03:34 AM           101,334 The_Task_of_a_Philosopher.txt.03.txt
  199. 10/13/2021  03:34 AM           113,997 The_Task_of_a_Philosopher.txt.04.txt
  200. 10/13/2021  03:34 AM           120,996 The_Task_of_a_Philosopher.txt.05.txt
  201.               5 File(s)        436,509 bytes
  202.               0 Dir(s)   3,711,840,256 bytes free
  203.  
  204. E:\z\JRJ>copy The_Task_of_a_Philosopher.txt.01.txt Gallowwalker.1.txt.sorted
  205.        1 file(s) copied.
  206.  
  207. E:\z\JRJ>copy The_Task_of_a_Philosopher.txt.02.txt Gallowwalker.2.txt.sorted
  208.        1 file(s) copied.
  209.  
  210. E:\z\JRJ>copy The_Task_of_a_Philosopher.txt.03.txt Gallowwalker.3.txt.sorted
  211.        1 file(s) copied.
  212.  
  213. E:\z\JRJ>copy The_Task_of_a_Philosopher.txt.04.txt Gallowwalker.4.txt.sorted
  214.        1 file(s) copied.
  215.  
  216. E:\z\JRJ>copy The_Task_of_a_Philosopher.txt.05.txt Gallowwalker.5.txt.sorted
  217.        1 file(s) copied.
  218.  
  219. E:\z\JRJ>dir
  220.  Volume in drive E is Sanmayce_111GB
  221.  Volume Serial Number is 1410-10F9
  222.  
  223.  Directory of E:\z\JRJ
  224.  
  225. 10/13/2021  03:34 AM    <DIR>          .
  226. 10/13/2021  03:34 AM    <DIR>          ..
  227. 10/13/2021  03:34 AM            25,290 Gallowwalker.1.txt.sorted
  228. 10/13/2021  03:34 AM            74,892 Gallowwalker.2.txt.sorted
  229. 10/13/2021  03:34 AM           101,334 Gallowwalker.3.txt.sorted
  230. 10/13/2021  03:34 AM           113,997 Gallowwalker.4.txt.sorted
  231. 10/13/2021  03:34 AM           120,996 Gallowwalker.5.txt.sorted
  232. 04/22/2021  05:36 AM    <DIR>          JRJ_essay
  233. 04/22/2021  05:37 AM           109,582 Kazahana_Hexadecad_GCC_102_32bit.exe
  234. 04/22/2021  05:37 AM         2,494,334 Kazahana_sources_binaries.zip
  235. 04/22/2021  05:37 AM           135,680 Leprechaun_x-leton_32bit_Intel_01_001p.exe
  236. 04/22/2021  05:37 AM           134,144 Leprechaun_x-leton_32bit_Intel_01_008p.exe
  237. 04/22/2021  05:37 AM           134,144 Leprechaun_x-leton_32bit_Intel_01_512p.exe
  238. 04/22/2021  05:37 AM           139,264 Leprechaun_x-leton_32bit_Intel_02_001p.exe
  239. 04/22/2021  05:37 AM           137,728 Leprechaun_x-leton_32bit_Intel_02_008p.exe
  240. 04/22/2021  05:37 AM           137,728 Leprechaun_x-leton_32bit_Intel_02_512p.exe
  241. 04/22/2021  05:37 AM           139,264 Leprechaun_x-leton_32bit_Intel_03_001p.exe
  242. 04/22/2021  05:37 AM           137,728 Leprechaun_x-leton_32bit_Intel_03_008p.exe
  243. 04/22/2021  05:37 AM           137,728 Leprechaun_x-leton_32bit_Intel_03_512p.exe
  244. 04/22/2021  05:37 AM           138,240 Leprechaun_x-leton_32bit_Intel_04_001p.exe
  245. 04/22/2021  05:37 AM           136,704 Leprechaun_x-leton_32bit_Intel_04_008p.exe
  246. 04/22/2021  05:37 AM           136,704 Leprechaun_x-leton_32bit_Intel_04_512p.exe
  247. 04/22/2021  05:37 AM           139,776 Leprechaun_x-leton_32bit_Intel_05_001p.exe
  248. 04/22/2021  05:37 AM           138,240 Leprechaun_x-leton_32bit_Intel_05_008p.exe
  249. 04/22/2021  05:37 AM           138,240 Leprechaun_x-leton_32bit_Intel_05_512p.exe
  250. 04/22/2021  05:37 AM           695,925 Leprechaun_x-leton_r17tag.7z
  251. 04/22/2021  05:37 AM            96,993 LineJustify_PAGODAo5.c
  252. 04/22/2021  05:37 AM            73,728 LineJustify_PAGODAo5.exe
  253. 04/22/2021  05:37 AM            94,162 LineWordreporter.c
  254. 04/22/2021  05:37 AM            77,312 LineWordreporter.exe
  255. 04/22/2021  05:37 AM             1,633 MokujIN JADE 217 prompt.lnk
  256. 04/22/2021  05:37 AM            28,393 The_Task_of_a_Philosopher.txt
  257. 10/13/2021  03:34 AM            25,290 The_Task_of_a_Philosopher.txt.01.txt
  258. 10/13/2021  03:34 AM            74,892 The_Task_of_a_Philosopher.txt.02.txt
  259. 10/13/2021  03:34 AM           101,334 The_Task_of_a_Philosopher.txt.03.txt
  260. 10/13/2021  03:34 AM           113,997 The_Task_of_a_Philosopher.txt.04.txt
  261. 10/13/2021  03:34 AM           120,996 The_Task_of_a_Philosopher.txt.05.txt
  262. 04/22/2021  05:37 AM             3,335 XGRAM_PAGODA5.bat
  263. 04/22/2021  05:37 AM             2,181 XGRAM_RIP_file.bat
  264. 04/22/2021  05:37 AM            35,015 Yoshi.exe
  265. 04/22/2021  05:37 AM           969,892 Yoshi7-.zip
  266.              38 File(s)      7,616,815 bytes
  267.               3 Dir(s)   3,711,393,792 bytes free
  268.  
  269. E:\z\JRJ>time
  270. The current time is:  3:34:58.06
  271. Enter the new time:
  272.  
  273. E:\z\JRJ>XGRAM_PAGODA5.bat take
  274.  
  275. E:\z\JRJ>if exist Gallowwalker.1.txt.sorted goto OK1
  276.  
  277. E:\z\JRJ>if exist Gallowwalker.2.txt.sorted goto OK2
  278.  
  279. E:\z\JRJ>if exist Gallowwalker.3.txt.sorted goto OK3
  280.  
  281. E:\z\JRJ>if exist Gallowwalker.4.txt.sorted goto OK4
  282.  
  283. E:\z\JRJ>if exist Gallowwalker.5.txt.sorted goto OK5
  284.  
  285. E:\z\JRJ>Kazahana_Hexadecad_GCC_102_32bit.exe "`take" Gallowwalker.1.txt.sorted 1023
  286. Kazahana, a typhoon-class exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix_DEFINE_Trolldom, copyleft Kaze 2019-May-21.
  287. Enforcing Case Sensitive wildcard mode ...
  288. Enforcing SLOW wildcard mode ...
  289. Current priority class is REALTIME_PRIORITY_CLASS.
  290. Pattern: `take
  291. omp_get_num_procs( ) = 2
  292. omp_get_max_threads( ) = 2
  293. Enforcing HEXADECAD i.e. hexadecuple-threads ...
  294. Allocating Master-Buffer 1023KB ... OK
  295. |; Speed: 00,000,000,000 bytes/clock; Traversed: 25,290 bytes
  296. Kazahana: Total/Checked/Dumped xgrams: 1,313/1,313/1
  297. Kazahana: Performance: 0 KB/clock
  298. Kazahana: Performance: 16 xgrams/clock
  299. Kazahana: Performance: Total/fread() clocks: 79/0
  300. Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
  301. Kazahana: Done.
  302.  
  303. E:\z\JRJ>sort /R Kazahana.txt /O Kazahana_take.1-1.txt
  304.  
  305. E:\z\JRJ>Kazahana_Hexadecad_GCC_102_32bit.exe "`take_." Gallowwalker.2.txt.sorted 1023
  306. Kazahana, a typhoon-class exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix_DEFINE_Trolldom, copyleft Kaze 2019-May-21.
  307. Enforcing Case Sensitive wildcard mode ...
  308. Enforcing SLOW wildcard mode ...
  309. Current priority class is REALTIME_PRIORITY_CLASS.
  310. Pattern: `take_.
  311. omp_get_num_procs( ) = 2
  312. omp_get_max_threads( ) = 2
  313. Enforcing HEXADECAD i.e. hexadecuple-threads ...
  314. Allocating Master-Buffer 1023KB ... OK
  315. |; Speed: 00,000,000,000 bytes/clock; Traversed: 74,892 bytes
  316. Kazahana: Total/Checked/Dumped xgrams: 3,133/3,133/4
  317. Kazahana: Performance: 0 KB/clock
  318. Kazahana: Performance: 32 xgrams/clock
  319. Kazahana: Performance: Total/fread() clocks: 95/0
  320. Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
  321. Kazahana: Done.
  322.  
  323. E:\z\JRJ>sort /R Kazahana.txt /O Kazahana_take.2-1.txt
  324.  
  325. E:\z\JRJ>Kazahana_Hexadecad_GCC_102_32bit.exe "`._take" Gallowwalker.2.txt.sorted 1023
  326. Kazahana, a typhoon-class exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix_DEFINE_Trolldom, copyleft Kaze 2019-May-21.
  327. Enforcing Case Sensitive wildcard mode ...
  328. Enforcing SLOW wildcard mode ...
  329. Current priority class is REALTIME_PRIORITY_CLASS.
  330. Pattern: `._take
  331. omp_get_num_procs( ) = 2
  332. omp_get_max_threads( ) = 2
  333. Enforcing HEXADECAD i.e. hexadecuple-threads ...
  334. Allocating Master-Buffer 1023KB ... OK
  335. |; Speed: 00,000,000,000 bytes/clock; Traversed: 74,892 bytes
  336. Kazahana: Total/Checked/Dumped xgrams: 3,133/3,133/4
  337. Kazahana: Performance: 0 KB/clock
  338. Kazahana: Performance: 39 xgrams/clock
  339. Kazahana: Performance: Total/fread() clocks: 79/0
  340. Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
  341. Kazahana: Done.
  342.  
  343. E:\z\JRJ>sort /R Kazahana.txt /O Kazahana_take.2-2.txt
  344.  
  345. E:\z\JRJ>Kazahana_Hexadecad_GCC_102_32bit.exe "`take_._." Gallowwalker.3.txt.sorted 1023
  346. Kazahana, a typhoon-class exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix_DEFINE_Trolldom, copyleft Kaze 2019-May-21.
  347. Enforcing Case Sensitive wildcard mode ...
  348. Enforcing SLOW wildcard mode ...
  349. Current priority class is REALTIME_PRIORITY_CLASS.
  350. Pattern: `take_._.
  351. omp_get_num_procs( ) = 2
  352. omp_get_max_threads( ) = 2
  353. Enforcing HEXADECAD i.e. hexadecuple-threads ...
  354. Allocating Master-Buffer 1023KB ... OK
  355. |; Speed: 00,000,000,000 bytes/clock; Traversed: 101,334 bytes
  356. Kazahana: Total/Checked/Dumped xgrams: 3,430/3,430/3
  357. Kazahana: Performance: 1 KB/clock
  358. Kazahana: Performance: 43 xgrams/clock
  359. Kazahana: Performance: Total/fread() clocks: 79/0
  360. Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
  361. Kazahana: Done.
  362.  
  363. E:\z\JRJ>sort /R Kazahana.txt /O Kazahana_take.3-1.txt
  364.  
  365. E:\z\JRJ>Kazahana_Hexadecad_GCC_102_32bit.exe "`._take_." Gallowwalker.3.txt.sorted 1023
  366. Kazahana, a typhoon-class exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix_DEFINE_Trolldom, copyleft Kaze 2019-May-21.
  367. Enforcing Case Sensitive wildcard mode ...
  368. Enforcing SLOW wildcard mode ...
  369. Current priority class is REALTIME_PRIORITY_CLASS.
  370. Pattern: `._take_.
  371. omp_get_num_procs( ) = 2
  372. omp_get_max_threads( ) = 2
  373. Enforcing HEXADECAD i.e. hexadecuple-threads ...
  374. Allocating Master-Buffer 1023KB ... OK
  375. |; Speed: 00,000,000,000 bytes/clock; Traversed: 101,334 bytes
  376. Kazahana: Total/Checked/Dumped xgrams: 3,430/3,430/4
  377. Kazahana: Performance: 1 KB/clock
  378. Kazahana: Performance: 43 xgrams/clock
  379. Kazahana: Performance: Total/fread() clocks: 79/0
  380. Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
  381. Kazahana: Done.
  382.  
  383. E:\z\JRJ>sort /R Kazahana.txt /O Kazahana_take.3-2.txt
  384.  
  385. E:\z\JRJ>Kazahana_Hexadecad_GCC_102_32bit.exe "`._._take" Gallowwalker.3.txt.sorted 1023
  386. Kazahana, a typhoon-class exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix_DEFINE_Trolldom, copyleft Kaze 2019-May-21.
  387. Enforcing Case Sensitive wildcard mode ...
  388. Enforcing SLOW wildcard mode ...
  389. Current priority class is REALTIME_PRIORITY_CLASS.
  390. Pattern: `._._take
  391. omp_get_num_procs( ) = 2
  392. omp_get_max_threads( ) = 2
  393. Enforcing HEXADECAD i.e. hexadecuple-threads ...
  394. Allocating Master-Buffer 1023KB ... OK
  395. |; Speed: 00,000,000,000 bytes/clock; Traversed: 101,334 bytes
  396. Kazahana: Total/Checked/Dumped xgrams: 3,430/3,430/4
  397. Kazahana: Performance: 1 KB/clock
  398. Kazahana: Performance: 43 xgrams/clock
  399. Kazahana: Performance: Total/fread() clocks: 79/0
  400. Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
  401. Kazahana: Done.
  402.  
  403. E:\z\JRJ>sort /R Kazahana.txt /O Kazahana_take.3-3.txt
  404.  
  405. E:\z\JRJ>Kazahana_Hexadecad_GCC_102_32bit.exe "`take_._._." Gallowwalker.4.txt.sorted 1023
  406. Kazahana, a typhoon-class exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix_DEFINE_Trolldom, copyleft Kaze 2019-May-21.
  407. Enforcing Case Sensitive wildcard mode ...
  408. Enforcing SLOW wildcard mode ...
  409. Current priority class is REALTIME_PRIORITY_CLASS.
  410. Pattern: `take_._._.
  411. omp_get_num_procs( ) = 2
  412. omp_get_max_threads( ) = 2
  413. Enforcing HEXADECAD i.e. hexadecuple-threads ...
  414. Allocating Master-Buffer 1023KB ... OK
  415. |; Speed: 00,000,000,000 bytes/clock; Traversed: 113,997 bytes
  416. Kazahana: Total/Checked/Dumped xgrams: 3,205/3,205/2
  417. Kazahana: Performance: 1 KB/clock
  418. Kazahana: Performance: 40 xgrams/clock
  419. Kazahana: Performance: Total/fread() clocks: 79/0
  420. Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
  421. Kazahana: Done.
  422.  
  423. E:\z\JRJ>sort /R Kazahana.txt /O Kazahana_take.4-1.txt
  424.  
  425. E:\z\JRJ>Kazahana_Hexadecad_GCC_102_32bit.exe "`._take_._." Gallowwalker.4.txt.sorted 1023
  426. Kazahana, a typhoon-class exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix_DEFINE_Trolldom, copyleft Kaze 2019-May-21.
  427. Enforcing Case Sensitive wildcard mode ...
  428. Enforcing SLOW wildcard mode ...
  429. Current priority class is REALTIME_PRIORITY_CLASS.
  430. Pattern: `._take_._.
  431. omp_get_num_procs( ) = 2
  432. omp_get_max_threads( ) = 2
  433. Enforcing HEXADECAD i.e. hexadecuple-threads ...
  434. Allocating Master-Buffer 1023KB ... OK
  435. |; Speed: 00,000,000,000 bytes/clock; Traversed: 113,997 bytes
  436. Kazahana: Total/Checked/Dumped xgrams: 3,205/3,205/3
  437. Kazahana: Performance: 1 KB/clock
  438. Kazahana: Performance: 40 xgrams/clock
  439. Kazahana: Performance: Total/fread() clocks: 79/0
  440. Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
  441. Kazahana: Done.
  442.  
  443. E:\z\JRJ>sort /R Kazahana.txt /O Kazahana_take.4-2.txt
  444.  
  445. E:\z\JRJ>Kazahana_Hexadecad_GCC_102_32bit.exe "`._._take_." Gallowwalker.4.txt.sorted 1023
  446. Kazahana, a typhoon-class exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix_DEFINE_Trolldom, copyleft Kaze 2019-May-21.
  447. Enforcing Case Sensitive wildcard mode ...
  448. Enforcing SLOW wildcard mode ...
  449. Current priority class is REALTIME_PRIORITY_CLASS.
  450. Pattern: `._._take_.
  451. omp_get_num_procs( ) = 2
  452. omp_get_max_threads( ) = 2
  453. Enforcing HEXADECAD i.e. hexadecuple-threads ...
  454. Allocating Master-Buffer 1023KB ... OK
  455. |; Speed: 00,000,000,000 bytes/clock; Traversed: 113,997 bytes
  456. Kazahana: Total/Checked/Dumped xgrams: 3,205/3,205/4
  457. Kazahana: Performance: 1 KB/clock
  458. Kazahana: Performance: 40 xgrams/clock
  459. Kazahana: Performance: Total/fread() clocks: 79/0
  460. Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
  461. Kazahana: Done.
  462.  
  463. E:\z\JRJ>sort /R Kazahana.txt /O Kazahana_take.4-3.txt
  464.  
  465. E:\z\JRJ>Kazahana_Hexadecad_GCC_102_32bit.exe "`._._._take" Gallowwalker.4.txt.sorted 1023
  466. Kazahana, a typhoon-class exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix_DEFINE_Trolldom, copyleft Kaze 2019-May-21.
  467. Enforcing Case Sensitive wildcard mode ...
  468. Enforcing SLOW wildcard mode ...
  469. Current priority class is REALTIME_PRIORITY_CLASS.
  470. Pattern: `._._._take
  471. omp_get_num_procs( ) = 2
  472. omp_get_max_threads( ) = 2
  473. Enforcing HEXADECAD i.e. hexadecuple-threads ...
  474. Allocating Master-Buffer 1023KB ... OK
  475. |; Speed: 00,000,000,000 bytes/clock; Traversed: 113,997 bytes
  476. Kazahana: Total/Checked/Dumped xgrams: 3,205/3,205/4
  477. Kazahana: Performance: 1 KB/clock
  478. Kazahana: Performance: 40 xgrams/clock
  479. Kazahana: Performance: Total/fread() clocks: 79/0
  480. Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
  481. Kazahana: Done.
  482.  
  483. E:\z\JRJ>sort /R Kazahana.txt /O Kazahana_take.4-4.txt
  484.  
  485. E:\z\JRJ>Kazahana_Hexadecad_GCC_102_32bit.exe "`take_._._._." Gallowwalker.5.txt.sorted 1023
  486. Kazahana, a typhoon-class exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix_DEFINE_Trolldom, copyleft Kaze 2019-May-21.
  487. Enforcing Case Sensitive wildcard mode ...
  488. Enforcing SLOW wildcard mode ...
  489. Current priority class is REALTIME_PRIORITY_CLASS.
  490. Pattern: `take_._._._.
  491. omp_get_num_procs( ) = 2
  492. omp_get_max_threads( ) = 2
  493. Enforcing HEXADECAD i.e. hexadecuple-threads ...
  494. Allocating Master-Buffer 1023KB ... OK
  495. |; Speed: 00,000,000,000 bytes/clock; Traversed: 120,996 bytes
  496. Kazahana: Total/Checked/Dumped xgrams: 2,905/2,905/2
  497. Kazahana: Performance: 1 KB/clock
  498. Kazahana: Performance: 36 xgrams/clock
  499. Kazahana: Performance: Total/fread() clocks: 79/0
  500. Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
  501. Kazahana: Done.
  502.  
  503. E:\z\JRJ>sort /R Kazahana.txt /O Kazahana_take.5-1.txt
  504.  
  505. E:\z\JRJ>Kazahana_Hexadecad_GCC_102_32bit.exe "`._take_._._." Gallowwalker.5.txt.sorted 1023
  506. Kazahana, a typhoon-class exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix_DEFINE_Trolldom, copyleft Kaze 2019-May-21.
  507. Enforcing Case Sensitive wildcard mode ...
  508. Enforcing SLOW wildcard mode ...
  509. Current priority class is REALTIME_PRIORITY_CLASS.
  510. Pattern: `._take_._._.
  511. omp_get_num_procs( ) = 2
  512. omp_get_max_threads( ) = 2
  513. Enforcing HEXADECAD i.e. hexadecuple-threads ...
  514. Allocating Master-Buffer 1023KB ... OK
  515. |; Speed: 00,000,000,000 bytes/clock; Traversed: 120,996 bytes
  516. Kazahana: Total/Checked/Dumped xgrams: 2,905/2,905/2
  517. Kazahana: Performance: 1 KB/clock
  518. Kazahana: Performance: 36 xgrams/clock
  519. Kazahana: Performance: Total/fread() clocks: 79/0
  520. Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
  521. Kazahana: Done.
  522.  
  523. E:\z\JRJ>sort /R Kazahana.txt /O Kazahana_take.5-2.txt
  524.  
  525. E:\z\JRJ>Kazahana_Hexadecad_GCC_102_32bit.exe "`._._take_._." Gallowwalker.5.txt.sorted 1023
  526. Kazahana, a typhoon-class exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix_DEFINE_Trolldom, copyleft Kaze 2019-May-21.
  527. Enforcing Case Sensitive wildcard mode ...
  528. Enforcing SLOW wildcard mode ...
  529. Current priority class is REALTIME_PRIORITY_CLASS.
  530. Pattern: `._._take_._.
  531. omp_get_num_procs( ) = 2
  532. omp_get_max_threads( ) = 2
  533. Enforcing HEXADECAD i.e. hexadecuple-threads ...
  534. Allocating Master-Buffer 1023KB ... OK
  535. |; Speed: 00,000,000,000 bytes/clock; Traversed: 120,996 bytes
  536. Kazahana: Total/Checked/Dumped xgrams: 2,905/2,905/3
  537. Kazahana: Performance: 1 KB/clock
  538. Kazahana: Performance: 36 xgrams/clock
  539. Kazahana: Performance: Total/fread() clocks: 79/0
  540. Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
  541. Kazahana: Done.
  542.  
  543. E:\z\JRJ>sort /R Kazahana.txt /O Kazahana_take.5-3.txt
  544.  
  545. E:\z\JRJ>Kazahana_Hexadecad_GCC_102_32bit.exe "`._._._take_." Gallowwalker.5.txt.sorted 1023
  546. Kazahana, a typhoon-class exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix_DEFINE_Trolldom, copyleft Kaze 2019-May-21.
  547. Enforcing Case Sensitive wildcard mode ...
  548. Enforcing SLOW wildcard mode ...
  549. Current priority class is REALTIME_PRIORITY_CLASS.
  550. Pattern: `._._._take_.
  551. omp_get_num_procs( ) = 2
  552. omp_get_max_threads( ) = 2
  553. Enforcing HEXADECAD i.e. hexadecuple-threads ...
  554. Allocating Master-Buffer 1023KB ... OK
  555. |; Speed: 00,000,000,000 bytes/clock; Traversed: 120,996 bytes
  556. Kazahana: Total/Checked/Dumped xgrams: 2,905/2,905/4
  557. Kazahana: Performance: 1 KB/clock
  558. Kazahana: Performance: 36 xgrams/clock
  559. Kazahana: Performance: Total/fread() clocks: 79/0
  560. Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
  561. Kazahana: Done.
  562.  
  563. E:\z\JRJ>sort /R Kazahana.txt /O Kazahana_take.5-4.txt
  564.  
  565. E:\z\JRJ>Kazahana_Hexadecad_GCC_102_32bit.exe "`._._._._take" Gallowwalker.5.txt.sorted 1023
  566. Kazahana, a typhoon-class exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix_DEFINE_Trolldom, copyleft Kaze 2019-May-21.
  567. Enforcing Case Sensitive wildcard mode ...
  568. Enforcing SLOW wildcard mode ...
  569. Current priority class is REALTIME_PRIORITY_CLASS.
  570. Pattern: `._._._._take
  571. omp_get_num_procs( ) = 2
  572. omp_get_max_threads( ) = 2
  573. Enforcing HEXADECAD i.e. hexadecuple-threads ...
  574. Allocating Master-Buffer 1023KB ... OK
  575. |; Speed: 00,000,000,000 bytes/clock; Traversed: 120,996 bytes
  576. Kazahana: Total/Checked/Dumped xgrams: 2,905/2,905/3
  577. Kazahana: Performance: 1 KB/clock
  578. Kazahana: Performance: 36 xgrams/clock
  579. Kazahana: Performance: Total/fread() clocks: 79/0
  580. Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
  581. Kazahana: Done.
  582.  
  583. E:\z\JRJ>sort /R Kazahana.txt /O Kazahana_take.5-5.txt
  584.  
  585. E:\z\JRJ>dir Kazahana_take.*.txt/b/on 1>q
  586.  
  587. E:\z\JRJ>LineJustify_PAGODAo5.exe q
  588. LineJustify_PAGODAo5, revision 1, written by Kaze.
  589. Purpose: Padding the left side of x-grams with SPACEs in order to form the main pillar.
  590. Example:
  591. D:\>LineWordreporter.exe LineJustify_PAGODAo5.lst
  592. Note: Files can exceed 4GB limit.
  593. Buffered PADDing ...
  594.  
  595. E:\z\JRJ>copy Kazahana_take.*.txt.PAD Kazahana_take.PAGODA-order-5.txt/b/y
  596. Kazahana_take.1-1.txt.PAD
  597. Kazahana_take.2-1.txt.PAD
  598. Kazahana_take.2-2.txt.PAD
  599. Kazahana_take.3-1.txt.PAD
  600. Kazahana_take.3-2.txt.PAD
  601. Kazahana_take.3-3.txt.PAD
  602. Kazahana_take.4-1.txt.PAD
  603. Kazahana_take.4-2.txt.PAD
  604. Kazahana_take.4-3.txt.PAD
  605. Kazahana_take.4-4.txt.PAD
  606. Kazahana_take.5-1.txt.PAD
  607. Kazahana_take.5-2.txt.PAD
  608. Kazahana_take.5-3.txt.PAD
  609. Kazahana_take.5-4.txt.PAD
  610. Kazahana_take.5-5.txt.PAD
  611.        1 file(s) copied.
  612.  
  613. E:\z\JRJ>LineWordreporter.exe q 1>>schisch.log
  614.  
  615. E:\z\JRJ>echo.1>>schisch.log
  616.  
  617. E:\z\JRJ>Leprechaun_x-leton_32bit_Intel_01_008p.exe q q.wrd 1234567 Y
  618. Leprechaun_singleton (Fast-In-Future Greedy n-gram-Ripper), rev. 17, written by Svalqyatchx.
  619. Purpose: Rips all distinct 1-grams (1-word phrases) with length 1..31 chars from incoming texts.
  620. Feature1: All words within x-lets/n-grams are in range 1..31 chars inclusive.
  621. Feature2: In this revision 512MB 1-way hash is used which results in 67,108,864 external B-Trees of order 3.
  622. Feature3: In this revision, 8 passes are to be made.
  623. Feature4: If the external memory has latency 99+microseconds then !(look no further), IOPS(seek-time) rules.
  624. Pass #1 of 8:
  625. Size of input file with files for Leprechauning: 345
  626. Allocating HASH memory 536,870,977 bytes ... OK
  627. Allocating memory 1206MB ... OK
  628. Size of Input TEXTual file: 16
  629. /; 00,000,001P/s; Phrase count: 1 of them 1 distinct; Done: 64/64
  630. Size of Input TEXTual file: 81
  631. -; 00,000,009P/s; Phrase count: 9 of them 1 distinct; Done: 64/64
  632. Size of Input TEXTual file: 87
  633. \; 00,000,017P/s; Phrase count: 17 of them 2 distinct; Done: 64/64
  634. Size of Input TEXTual file: 75
  635. |; 00,000,026P/s; Phrase count: 26 of them 3 distinct; Done: 64/64
  636. Size of Input TEXTual file: 104
  637. /; 00,000,038P/s; Phrase count: 38 of them 3 distinct; Done: 64/64
  638. Size of Input TEXTual file: 121
  639. -; 00,000,050P/s; Phrase count: 50 of them 4 distinct; Done: 64/64
  640. Size of Input TEXTual file: 59
  641. \; 00,000,058P/s; Phrase count: 58 of them 5 distinct; Done: 64/64
  642. Size of Input TEXTual file: 89
  643. |; 00,000,070P/s; Phrase count: 70 of them 5 distinct; Done: 64/64
  644. Size of Input TEXTual file: 138
  645. /; 00,000,086P/s; Phrase count: 86 of them 5 distinct; Done: 64/64
  646. Size of Input TEXTual file: 148
  647. -; 00,000,102P/s; Phrase count: 102 of them 5 distinct; Done: 64/64
  648. Size of Input TEXTual file: 65
  649. \; 00,000,112P/s; Phrase count: 112 of them 5 distinct; Done: 64/64
  650. Size of Input TEXTual file: 69
  651. |; 00,000,122P/s; Phrase count: 122 of them 5 distinct; Done: 64/64
  652. Size of Input TEXTual file: 112
  653. /; 00,000,137P/s; Phrase count: 137 of them 5 distinct; Done: 64/64
  654. Size of Input TEXTual file: 165
  655. -; 00,000,157P/s; Phrase count: 157 of them 5 distinct; Done: 64/64
  656. Size of Input TEXTual file: 133
  657. \; 00,000,172P/s; Phrase count: 172 of them 5 distinct; Done: 64/64
  658. Bytes per second performance: 1,462B/s
  659. Phrases per second performance: 172P/s
  660. Time for putting phrases into trees: 1 second(s)
  661. Flushing UNsorted phrases: 100%; Shaking trees performance: 00,000,010P/s
  662. Time for shaking phrases from trees: 1 second(s)
  663. Leprechaun: Current pass done.
  664. Pass #2 of 8:
  665. Size of input file with files for Leprechauning: 345
  666. Allocating HASH memory 536,870,977 bytes ... OK
  667. Allocating memory 1206MB ... OK
  668. Size of Input TEXTual file: 16
  669. /; 00,000,001P/s; Phrase count: 1 of them 0 distinct; Done: 64/64
  670. Size of Input TEXTual file: 81
  671. -; 00,000,009P/s; Phrase count: 9 of them 1 distinct; Done: 64/64
  672. Size of Input TEXTual file: 87
  673. \; 00,000,017P/s; Phrase count: 17 of them 3 distinct; Done: 64/64
  674. Size of Input TEXTual file: 75
  675. |; 00,000,026P/s; Phrase count: 26 of them 4 distinct; Done: 64/64
  676. Size of Input TEXTual file: 104
  677. /; 00,000,038P/s; Phrase count: 38 of them 4 distinct; Done: 64/64
  678. Size of Input TEXTual file: 121
  679. -; 00,000,050P/s; Phrase count: 50 of them 4 distinct; Done: 64/64
  680. Size of Input TEXTual file: 59
  681. \; 00,000,058P/s; Phrase count: 58 of them 4 distinct; Done: 64/64
  682. Size of Input TEXTual file: 89
  683. |; 00,000,070P/s; Phrase count: 70 of them 4 distinct; Done: 64/64
  684. Size of Input TEXTual file: 138
  685. /; 00,000,086P/s; Phrase count: 86 of them 4 distinct; Done: 64/64
  686. Size of Input TEXTual file: 148
  687. -; 00,000,102P/s; Phrase count: 102 of them 4 distinct; Done: 64/64
  688. Size of Input TEXTual file: 65
  689. \; 00,000,112P/s; Phrase count: 112 of them 4 distinct; Done: 64/64
  690. Size of Input TEXTual file: 69
  691. |; 00,000,122P/s; Phrase count: 122 of them 4 distinct; Done: 64/64
  692. Size of Input TEXTual file: 112
  693. /; 00,000,137P/s; Phrase count: 137 of them 4 distinct; Done: 64/64
  694. Size of Input TEXTual file: 165
  695. -; 00,000,157P/s; Phrase count: 157 of them 4 distinct; Done: 64/64
  696. Size of Input TEXTual file: 133
  697. \; 00,000,172P/s; Phrase count: 172 of them 4 distinct; Done: 64/64
  698. Bytes per second performance: 1,462B/s
  699. Phrases per second performance: 172P/s
  700. Time for putting phrases into trees: 1 second(s)
  701. Flushing UNsorted phrases: 100%; Shaking trees performance: 00,000,008P/s
  702. Time for shaking phrases from trees: 1 second(s)
  703. Leprechaun: Current pass done.
  704. Pass #3 of 8:
  705. Size of input file with files for Leprechauning: 345
  706. Allocating HASH memory 536,870,977 bytes ... OK
  707. Allocating memory 1206MB ... OK
  708. Size of Input TEXTual file: 16
  709. /; 00,000,001P/s; Phrase count: 1 of them 0 distinct; Done: 64/64
  710. Size of Input TEXTual file: 81
  711. -; 00,000,009P/s; Phrase count: 9 of them 0 distinct; Done: 64/64
  712. Size of Input TEXTual file: 87
  713. \; 00,000,017P/s; Phrase count: 17 of them 0 distinct; Done: 64/64
  714. Size of Input TEXTual file: 75
  715. |; 00,000,026P/s; Phrase count: 26 of them 0 distinct; Done: 64/64
  716. Size of Input TEXTual file: 104
  717. /; 00,000,038P/s; Phrase count: 38 of them 0 distinct; Done: 64/64
  718. Size of Input TEXTual file: 121
  719. -; 00,000,050P/s; Phrase count: 50 of them 1 distinct; Done: 64/64
  720. Size of Input TEXTual file: 59
  721. \; 00,000,058P/s; Phrase count: 58 of them 2 distinct; Done: 64/64
  722. Size of Input TEXTual file: 89
  723. |; 00,000,070P/s; Phrase count: 70 of them 2 distinct; Done: 64/64
  724. Size of Input TEXTual file: 138
  725. /; 00,000,086P/s; Phrase count: 86 of them 2 distinct; Done: 64/64
  726. Size of Input TEXTual file: 148
  727. -; 00,000,102P/s; Phrase count: 102 of them 2 distinct; Done: 64/64
  728. Size of Input TEXTual file: 65
  729. \; 00,000,112P/s; Phrase count: 112 of them 2 distinct; Done: 64/64
  730. Size of Input TEXTual file: 69
  731. |; 00,000,122P/s; Phrase count: 122 of them 2 distinct; Done: 64/64
  732. Size of Input TEXTual file: 112
  733. /; 00,000,137P/s; Phrase count: 137 of them 2 distinct; Done: 64/64
  734. Size of Input TEXTual file: 165
  735. -; 00,000,157P/s; Phrase count: 157 of them 2 distinct; Done: 64/64
  736. Size of Input TEXTual file: 133
  737. \; 00,000,172P/s; Phrase count: 172 of them 2 distinct; Done: 64/64
  738. Bytes per second performance: 1,462B/s
  739. Phrases per second performance: 172P/s
  740. Time for putting phrases into trees: 1 second(s)
  741. Flushing UNsorted phrases: 100%; Shaking trees performance: 00,000,004P/s
  742. Time for shaking phrases from trees: 1 second(s)
  743. Leprechaun: Current pass done.
  744. Pass #4 of 8:
  745. Size of input file with files for Leprechauning: 345
  746. Allocating HASH memory 536,870,977 bytes ... OK
  747. Allocating memory 1206MB ... OK
  748. Size of Input TEXTual file: 16
  749. /; 00,000,001P/s; Phrase count: 1 of them 0 distinct; Done: 64/64
  750. Size of Input TEXTual file: 81
  751. -; 00,000,009P/s; Phrase count: 9 of them 0 distinct; Done: 64/64
  752. Size of Input TEXTual file: 87
  753. \; 00,000,017P/s; Phrase count: 17 of them 0 distinct; Done: 64/64
  754. Size of Input TEXTual file: 75
  755. |; 00,000,026P/s; Phrase count: 26 of them 0 distinct; Done: 64/64
  756. Size of Input TEXTual file: 104
  757. /; 00,000,038P/s; Phrase count: 38 of them 0 distinct; Done: 64/64
  758. Size of Input TEXTual file: 121
  759. -; 00,000,050P/s; Phrase count: 50 of them 1 distinct; Done: 64/64
  760. Size of Input TEXTual file: 59
  761. \; 00,000,058P/s; Phrase count: 58 of them 1 distinct; Done: 64/64
  762. Size of Input TEXTual file: 89
  763. |; 00,000,070P/s; Phrase count: 70 of them 1 distinct; Done: 64/64
  764. Size of Input TEXTual file: 138
  765. /; 00,000,086P/s; Phrase count: 86 of them 1 distinct; Done: 64/64
  766. Size of Input TEXTual file: 148
  767. -; 00,000,102P/s; Phrase count: 102 of them 1 distinct; Done: 64/64
  768. Size of Input TEXTual file: 65
  769. \; 00,000,112P/s; Phrase count: 112 of them 1 distinct; Done: 64/64
  770. Size of Input TEXTual file: 69
  771. |; 00,000,122P/s; Phrase count: 122 of them 1 distinct; Done: 64/64
  772. Size of Input TEXTual file: 112
  773. /; 00,000,137P/s; Phrase count: 137 of them 1 distinct; Done: 64/64
  774. Size of Input TEXTual file: 165
  775. -; 00,000,157P/s; Phrase count: 157 of them 1 distinct; Done: 64/64
  776. Size of Input TEXTual file: 133
  777. \; 00,000,172P/s; Phrase count: 172 of them 1 distinct; Done: 64/64
  778. Bytes per second performance: 1,462B/s
  779. Phrases per second performance: 172P/s
  780. Time for putting phrases into trees: 1 second(s)
  781. Flushing UNsorted phrases: 100%; Shaking trees performance: 00,000,002P/s
  782. Time for shaking phrases from trees: 1 second(s)
  783. Leprechaun: Current pass done.
  784. Pass #5 of 8:
  785. Size of input file with files for Leprechauning: 345
  786. Allocating HASH memory 536,870,977 bytes ... OK
  787. Allocating memory 1206MB ... OK
  788. Size of Input TEXTual file: 16
  789. /; 00,000,001P/s; Phrase count: 1 of them 0 distinct; Done: 64/64
  790. Size of Input TEXTual file: 81
  791. -; 00,000,009P/s; Phrase count: 9 of them 1 distinct; Done: 64/64
  792. Size of Input TEXTual file: 87
  793. \; 00,000,017P/s; Phrase count: 17 of them 1 distinct; Done: 64/64
  794. Size of Input TEXTual file: 75
  795. |; 00,000,026P/s; Phrase count: 26 of them 1 distinct; Done: 64/64
  796. Size of Input TEXTual file: 104
  797. /; 00,000,038P/s; Phrase count: 38 of them 1 distinct; Done: 64/64
  798. Size of Input TEXTual file: 121
  799. -; 00,000,050P/s; Phrase count: 50 of them 1 distinct; Done: 64/64
  800. Size of Input TEXTual file: 59
  801. \; 00,000,058P/s; Phrase count: 58 of them 1 distinct; Done: 64/64
  802. Size of Input TEXTual file: 89
  803. |; 00,000,070P/s; Phrase count: 70 of them 1 distinct; Done: 64/64
  804. Size of Input TEXTual file: 138
  805. /; 00,000,086P/s; Phrase count: 86 of them 1 distinct; Done: 64/64
  806. Size of Input TEXTual file: 148
  807. -; 00,000,102P/s; Phrase count: 102 of them 1 distinct; Done: 64/64
  808. Size of Input TEXTual file: 65
  809. \; 00,000,112P/s; Phrase count: 112 of them 2 distinct; Done: 64/64
  810. Size of Input TEXTual file: 69
  811. |; 00,000,122P/s; Phrase count: 122 of them 2 distinct; Done: 64/64
  812. Size of Input TEXTual file: 112
  813. /; 00,000,137P/s; Phrase count: 137 of them 2 distinct; Done: 64/64
  814. Size of Input TEXTual file: 165
  815. -; 00,000,157P/s; Phrase count: 157 of them 2 distinct; Done: 64/64
  816. Size of Input TEXTual file: 133
  817. \; 00,000,172P/s; Phrase count: 172 of them 3 distinct; Done: 64/64
  818. Bytes per second performance: 1,462B/s
  819. Phrases per second performance: 172P/s
  820. Time for putting phrases into trees: 1 second(s)
  821. Flushing UNsorted phrases: 100%; Shaking trees performance: 00,000,006P/s
  822. Time for shaking phrases from trees: 1 second(s)
  823. Leprechaun: Current pass done.
  824. Pass #6 of 8:
  825. Size of input file with files for Leprechauning: 345
  826. Allocating HASH memory 536,870,977 bytes ... OK
  827. Allocating memory 1206MB ... OK
  828. Size of Input TEXTual file: 16
  829. /; 00,000,001P/s; Phrase count: 1 of them 0 distinct; Done: 64/64
  830. Size of Input TEXTual file: 81
  831. -; 00,000,009P/s; Phrase count: 9 of them 1 distinct; Done: 64/64
  832. Size of Input TEXTual file: 87
  833. \; 00,000,017P/s; Phrase count: 17 of them 1 distinct; Done: 64/64
  834. Size of Input TEXTual file: 75
  835. |; 00,000,026P/s; Phrase count: 26 of them 2 distinct; Done: 64/64
  836. Size of Input TEXTual file: 104
  837. /; 00,000,038P/s; Phrase count: 38 of them 2 distinct; Done: 64/64
  838. Size of Input TEXTual file: 121
  839. -; 00,000,050P/s; Phrase count: 50 of them 2 distinct; Done: 64/64
  840. Size of Input TEXTual file: 59
  841. \; 00,000,058P/s; Phrase count: 58 of them 2 distinct; Done: 64/64
  842. Size of Input TEXTual file: 89
  843. |; 00,000,070P/s; Phrase count: 70 of them 2 distinct; Done: 64/64
  844. Size of Input TEXTual file: 138
  845. /; 00,000,086P/s; Phrase count: 86 of them 2 distinct; Done: 64/64
  846. Size of Input TEXTual file: 148
  847. -; 00,000,102P/s; Phrase count: 102 of them 3 distinct; Done: 64/64
  848. Size of Input TEXTual file: 65
  849. \; 00,000,112P/s; Phrase count: 112 of them 3 distinct; Done: 64/64
  850. Size of Input TEXTual file: 69
  851. |; 00,000,122P/s; Phrase count: 122 of them 3 distinct; Done: 64/64
  852. Size of Input TEXTual file: 112
  853. /; 00,000,137P/s; Phrase count: 137 of them 3 distinct; Done: 64/64
  854. Size of Input TEXTual file: 165
  855. -; 00,000,157P/s; Phrase count: 157 of them 3 distinct; Done: 64/64
  856. Size of Input TEXTual file: 133
  857. \; 00,000,172P/s; Phrase count: 172 of them 4 distinct; Done: 64/64
  858. Bytes per second performance: 1,462B/s
  859. Phrases per second performance: 172P/s
  860. Time for putting phrases into trees: 1 second(s)
  861. Flushing UNsorted phrases: 100%; Shaking trees performance: 00,000,008P/s
  862. Time for shaking phrases from trees: 1 second(s)
  863. Leprechaun: Current pass done.
  864. Pass #7 of 8:
  865. Size of input file with files for Leprechauning: 345
  866. Allocating HASH memory 536,870,977 bytes ... OK
  867. Allocating memory 1206MB ... OK
  868. Size of Input TEXTual file: 16
  869. /; 00,000,001P/s; Phrase count: 1 of them 0 distinct; Done: 64/64
  870. Size of Input TEXTual file: 81
  871. -; 00,000,009P/s; Phrase count: 9 of them 1 distinct; Done: 64/64
  872. Size of Input TEXTual file: 87
  873. \; 00,000,017P/s; Phrase count: 17 of them 2 distinct; Done: 64/64
  874. Size of Input TEXTual file: 75
  875. |; 00,000,026P/s; Phrase count: 26 of them 2 distinct; Done: 64/64
  876. Size of Input TEXTual file: 104
  877. /; 00,000,038P/s; Phrase count: 38 of them 2 distinct; Done: 64/64
  878. Size of Input TEXTual file: 121
  879. -; 00,000,050P/s; Phrase count: 50 of them 2 distinct; Done: 64/64
  880. Size of Input TEXTual file: 59
  881. \; 00,000,058P/s; Phrase count: 58 of them 2 distinct; Done: 64/64
  882. Size of Input TEXTual file: 89
  883. |; 00,000,070P/s; Phrase count: 70 of them 2 distinct; Done: 64/64
  884. Size of Input TEXTual file: 138
  885. /; 00,000,086P/s; Phrase count: 86 of them 2 distinct; Done: 64/64
  886. Size of Input TEXTual file: 148
  887. -; 00,000,102P/s; Phrase count: 102 of them 4 distinct; Done: 64/64
  888. Size of Input TEXTual file: 65
  889. \; 00,000,112P/s; Phrase count: 112 of them 5 distinct; Done: 64/64
  890. Size of Input TEXTual file: 69
  891. |; 00,000,122P/s; Phrase count: 122 of them 5 distinct; Done: 64/64
  892. Size of Input TEXTual file: 112
  893. /; 00,000,137P/s; Phrase count: 137 of them 5 distinct; Done: 64/64
  894. Size of Input TEXTual file: 165
  895. -; 00,000,157P/s; Phrase count: 157 of them 5 distinct; Done: 64/64
  896. Size of Input TEXTual file: 133
  897. \; 00,000,172P/s; Phrase count: 172 of them 5 distinct; Done: 64/64
  898. Bytes per second performance: 1,462B/s
  899. Phrases per second performance: 172P/s
  900. Time for putting phrases into trees: 1 second(s)
  901. Flushing UNsorted phrases: 100%; Shaking trees performance: 00,000,010P/s
  902. Time for shaking phrases from trees: 1 second(s)
  903. Leprechaun: Current pass done.
  904. Pass #8 of 8:
  905. Size of input file with files for Leprechauning: 345
  906. Allocating HASH memory 536,870,977 bytes ... OK
  907. Allocating memory 1206MB ... OK
  908. Size of Input TEXTual file: 16
  909. /; 00,000,001P/s; Phrase count: 1 of them 0 distinct; Done: 64/64
  910. Size of Input TEXTual file: 81
  911. -; 00,000,009P/s; Phrase count: 9 of them 0 distinct; Done: 64/64
  912. Size of Input TEXTual file: 87
  913. \; 00,000,017P/s; Phrase count: 17 of them 0 distinct; Done: 64/64
  914. Size of Input TEXTual file: 75
  915. |; 00,000,026P/s; Phrase count: 26 of them 0 distinct; Done: 64/64
  916. Size of Input TEXTual file: 104
  917. /; 00,000,038P/s; Phrase count: 38 of them 0 distinct; Done: 64/64
  918. Size of Input TEXTual file: 121
  919. -; 00,000,050P/s; Phrase count: 50 of them 1 distinct; Done: 64/64
  920. Size of Input TEXTual file: 59
  921. \; 00,000,058P/s; Phrase count: 58 of them 1 distinct; Done: 64/64
  922. Size of Input TEXTual file: 89
  923. |; 00,000,070P/s; Phrase count: 70 of them 1 distinct; Done: 64/64
  924. Size of Input TEXTual file: 138
  925. /; 00,000,086P/s; Phrase count: 86 of them 1 distinct; Done: 64/64
  926. Size of Input TEXTual file: 148
  927. -; 00,000,102P/s; Phrase count: 102 of them 1 distinct; Done: 64/64
  928. Size of Input TEXTual file: 65
  929. \; 00,000,112P/s; Phrase count: 112 of them 1 distinct; Done: 64/64
  930. Size of Input TEXTual file: 69
  931. |; 00,000,122P/s; Phrase count: 122 of them 1 distinct; Done: 64/64
  932. Size of Input TEXTual file: 112
  933. /; 00,000,137P/s; Phrase count: 137 of them 1 distinct; Done: 64/64
  934. Size of Input TEXTual file: 165
  935. -; 00,000,157P/s; Phrase count: 157 of them 1 distinct; Done: 64/64
  936. Size of Input TEXTual file: 133
  937. \; 00,000,172P/s; Phrase count: 172 of them 1 distinct; Done: 64/64
  938. Bytes per second performance: 1,462B/s
  939. Phrases per second performance: 172P/s
  940. Time for putting phrases into trees: 1 second(s)
  941. Flushing UNsorted phrases: 100%; Shaking trees performance: 00,000,002P/s
  942. Time for shaking phrases from trees: 1 second(s)
  943. Leprechaun: Current pass done.
  944.  
  945. Total memory needed for one pass: 8KB
  946. Total distinct phrases: 25
  947. Total time: 9 second(s)
  948. Total performance: 19P/s i.e. phrases per second
  949. Leprechaun: Done.
  950.  
  951. E:\z\JRJ>sort /R q.wrd /O Kazahana_take.PAGODA-order-5.wrd
  952.  
  953. E:\z\JRJ>dir Kazahana_take.* 1>>schisch.log
  954.  
  955. E:\z\JRJ>echo.1>>schisch.log
  956. E:\z\JRJ>dir
  957.  Volume in drive E is Sanmayce_111GB
  958.  Volume Serial Number is 1410-10F9
  959.  
  960.  Directory of E:\z\JRJ
  961.  
  962. 10/13/2021  03:35 AM    <DIR>          .
  963. 10/13/2021  03:35 AM    <DIR>          ..
  964. 10/13/2021  03:34 AM            25,290 Gallowwalker.1.txt.sorted
  965. 10/13/2021  03:34 AM            74,892 Gallowwalker.2.txt.sorted
  966. 10/13/2021  03:34 AM           101,334 Gallowwalker.3.txt.sorted
  967. 10/13/2021  03:34 AM           113,997 Gallowwalker.4.txt.sorted
  968. 10/13/2021  03:34 AM           120,996 Gallowwalker.5.txt.sorted
  969. 04/22/2021  05:36 AM    <DIR>          JRJ_essay
  970. 04/22/2021  05:37 AM           109,582 Kazahana_Hexadecad_GCC_102_32bit.exe
  971. 04/22/2021  05:37 AM         2,494,334 Kazahana_sources_binaries.zip
  972. 10/13/2021  03:35 AM                16 Kazahana_take.1-1.txt
  973. 10/13/2021  03:35 AM                81 Kazahana_take.2-1.txt
  974. 10/13/2021  03:35 AM                87 Kazahana_take.2-2.txt
  975. 10/13/2021  03:35 AM                75 Kazahana_take.3-1.txt
  976. 10/13/2021  03:35 AM               104 Kazahana_take.3-2.txt
  977. 10/13/2021  03:35 AM               121 Kazahana_take.3-3.txt
  978. 10/13/2021  03:35 AM                59 Kazahana_take.4-1.txt
  979. 10/13/2021  03:35 AM                89 Kazahana_take.4-2.txt
  980. 10/13/2021  03:35 AM               138 Kazahana_take.4-3.txt
  981. 10/13/2021  03:35 AM               148 Kazahana_take.4-4.txt
  982. 10/13/2021  03:35 AM                65 Kazahana_take.5-1.txt
  983. 10/13/2021  03:35 AM                69 Kazahana_take.5-2.txt
  984. 10/13/2021  03:35 AM               112 Kazahana_take.5-3.txt
  985. 10/13/2021  03:35 AM               165 Kazahana_take.5-4.txt
  986. 10/13/2021  03:35 AM               133 Kazahana_take.5-5.txt
  987. 10/13/2021  03:35 AM             3,761 Kazahana_take.PAGODA-order-5.txt
  988. 10/13/2021  03:35 AM               423 Kazahana_take.PAGODA-order-5.wrd
  989. 04/22/2021  05:37 AM           135,680 Leprechaun_x-leton_32bit_Intel_01_001p.exe
  990. 04/22/2021  05:37 AM           134,144 Leprechaun_x-leton_32bit_Intel_01_008p.exe
  991. 04/22/2021  05:37 AM           134,144 Leprechaun_x-leton_32bit_Intel_01_512p.exe
  992. 04/22/2021  05:37 AM           139,264 Leprechaun_x-leton_32bit_Intel_02_001p.exe
  993. 04/22/2021  05:37 AM           137,728 Leprechaun_x-leton_32bit_Intel_02_008p.exe
  994. 04/22/2021  05:37 AM           137,728 Leprechaun_x-leton_32bit_Intel_02_512p.exe
  995. 04/22/2021  05:37 AM           139,264 Leprechaun_x-leton_32bit_Intel_03_001p.exe
  996. 04/22/2021  05:37 AM           137,728 Leprechaun_x-leton_32bit_Intel_03_008p.exe
  997. 04/22/2021  05:37 AM           137,728 Leprechaun_x-leton_32bit_Intel_03_512p.exe
  998. 04/22/2021  05:37 AM           138,240 Leprechaun_x-leton_32bit_Intel_04_001p.exe
  999. 04/22/2021  05:37 AM           136,704 Leprechaun_x-leton_32bit_Intel_04_008p.exe
  1000. 04/22/2021  05:37 AM           136,704 Leprechaun_x-leton_32bit_Intel_04_512p.exe
  1001. 04/22/2021  05:37 AM           139,776 Leprechaun_x-leton_32bit_Intel_05_001p.exe
  1002. 04/22/2021  05:37 AM           138,240 Leprechaun_x-leton_32bit_Intel_05_008p.exe
  1003. 04/22/2021  05:37 AM           138,240 Leprechaun_x-leton_32bit_Intel_05_512p.exe
  1004. 04/22/2021  05:37 AM           695,925 Leprechaun_x-leton_r17tag.7z
  1005. 04/22/2021  05:37 AM            96,993 LineJustify_PAGODAo5.c
  1006. 04/22/2021  05:37 AM            73,728 LineJustify_PAGODAo5.exe
  1007. 04/22/2021  05:37 AM            94,162 LineWordreporter.c
  1008. 04/22/2021  05:37 AM            77,312 LineWordreporter.exe
  1009. 04/22/2021  05:37 AM             1,633 MokujIN JADE 217 prompt.lnk
  1010. 10/13/2021  03:35 AM             1,763 schisch.log
  1011. 04/22/2021  05:37 AM            28,393 The_Task_of_a_Philosopher.txt
  1012. 10/13/2021  03:34 AM            25,290 The_Task_of_a_Philosopher.txt.01.txt
  1013. 10/13/2021  03:34 AM            74,892 The_Task_of_a_Philosopher.txt.02.txt
  1014. 10/13/2021  03:34 AM           101,334 The_Task_of_a_Philosopher.txt.03.txt
  1015. 10/13/2021  03:34 AM           113,997 The_Task_of_a_Philosopher.txt.04.txt
  1016. 10/13/2021  03:34 AM           120,996 The_Task_of_a_Philosopher.txt.05.txt
  1017. 04/22/2021  05:37 AM             3,335 XGRAM_PAGODA5.bat
  1018. 04/22/2021  05:37 AM             2,181 XGRAM_RIP_file.bat
  1019. 04/22/2021  05:37 AM            35,015 Yoshi.exe
  1020. 04/22/2021  05:37 AM           969,892 Yoshi7-.zip
  1021.              56 File(s)      7,624,224 bytes
  1022.               3 Dir(s)   3,711,369,216 bytes free
  1023.  
  1024. E:\z\JRJ>time
  1025. The current time is:  3:35:40.04
  1026. Enter the new time:
  1027.  
  1028. E:\z\JRJ>notepad Kazahana_take.PAGODA-order-5.txt
  1029.  
  1030. E:\z\JRJ>
  1031.  

Also, just this night my 1-gram corpus (being the biggest in Internet) has been updated:

 
Schiza_d.png


In those 800+ million unique words a lot of noise is there, but the important thing is that it serves as the "has it been used before" reference. If using the tagged .1gram corpus (23GB in size) it can tell how many times and in which sub-corpus a word appeared.

Code: QB64: [Select]
  1. 10/12/2021  06:08 AM    24,332,870,289 Schizandrafield_Corpus_revision_D_(45-corpora_-unique-words).sorted
  2. 10/12/2021  01:05 PM     9,036,707,422 Schizandrafield_Corpus_revision_D_(45-corpora_-unique-words).sorted.wrd
  3.  
  4. G:\Schiz_revision-D>type "Schizandrafield_Corpus_revision_D_(45-corpora_-unique-words).sorted.wrd"|more
  5. a
  6. aa
  7. aaa
  8. aaaa
  9. aaaaa
  10. aaaaaa
  11. aaaaaaa
  12. aaaaaaaa
  13. aaaaaaaaa
  14. aaaaaaaaaa
  15. aaaaaaaaaaa
  16. aaaaaaaaaaaa
  17. aaaaaaaaaaaaa
  18. aaaaaaaaaaaaaa
  19. aaaaaaaaaaaaaaa
  20. aaaaaaaaaaaaaaaa
  21. aaaaaaaaaaaaaaaaa
  22. aaaaaaaaaaaaaaaaaa
  23. aaaaaaaaaaaaaaaaaaa
  24. aaaaaaaaaaaaaaaaaaaa
  25. aaaaaaaaaaaaaaaaaaaaa
  26. aaaaaaaaaaaaaaaaaaaaaa
  27. aaaaaaaaaaaaaaaaaaaaaaa
  28. aaaaaaaaaaaaaaaaaaaaaaaa
  29. aaaaaaaaaaaaaaaaaaaaaaaaa
  30. aaaaaaaaaaaaaaaaaaaaaaaaaa
  31. aaaaaaaaaaaaaaaaaaaaaaaaaaa
  32. aaaaaaaaaaaaaaaaaaaaaaaaaaaa
  33. aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
  34. aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
  35. aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
  36. aaaaaaaaaaaaaaaaaaaaaaaaaaaaaab
  37. aaaaaaaaaaaaaaaaaaaaaaaaaaaaaac
  38. aaaaaaaaaaaaaaaaaaaaaaaaaaaaaad
  39. aaaaaaaaaaaaaaaaaaaaaaaaaaaaaah
  40. ...
  41. G:\Schiz_revision-D>
  42.  

Sadly, no enough storage capacity available at the moment, even compressed with bzip2 the mere wordlist is ~2GB. Of course, in the near future will share the 23GB as well, somehow...
* JRJ.7z (Filesize: 4.68 MB, Downloads: 136)
He learns not to learn and reverts to what all men pass by.

Offline Petr

  • Forum Resident
  • Posts: 1720
  • The best code is the DNA of the hops.
    • View Profile
Re: Text Corrector
« Reply #3 on: October 13, 2021, 11:56:09 am »
Quote
Sadly, no enough storage capacity available at the moment, even compressed with bzip2 the mere wordlist is ~2GB. Of course, in the near future will share the 23GB as well, somehow...

Wouldn't it be possible to use the public repository and just add a link here?

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: Text Corrector
« Reply #4 on: October 13, 2021, 12:01:31 pm »
Yeah, How big are Clouds?

Offline Petr

  • Forum Resident
  • Posts: 1720
  • The best code is the DNA of the hops.
    • View Profile
Re: Text Corrector
« Reply #5 on: October 13, 2021, 12:06:15 pm »
Quote
Posted by: bplus

Yeah, How big are Clouds?

Google Drive about 15 Gigabytes