Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - Sanmayce

Pages: [1] 2 3 ... 5

Programs / Re: A skeleton code for Text Scroller via Drag-and-Drop

« on: March 14, 2022, 06:12:56 am »

Quote from: mdijkens on March 14, 2022, 04:45:19 am

Very interesting thread! Thank you!

I am regularly working with really big files (some over to 100GB) and have been optimizing 2 functions to read/process big files very fast.

I thought I'd add them here for information:
...

You are welcome, at the moment have no time to make some dedicated example of sorting (fixed-length or/and variable-length i.e. string (LF instead of NULL) type) tool.
Let me know what interests you most, what is the main bottleneck you encounter in your processings - the parsing, the sorting, the searching...
This thread is all about these BASIC things, but optimized, my wish is to share and thus show the power of QB64 (as in the old days when pairing QB and ASM was, resembles QB64 and C duo, now).

All these micro-projects contribute to the Masakari project, the idea being, we all to have powerful and fast text-oriented routines. If you haven't looked up the Masakari text loading (somewhat similar to yours), it allows sorting lines per size or by their offset in memory (i.e. returning to the original state), the quick usage being:

Code: QB64: [Select]

Declare CustomType Library "qsm_linesize" ' Notice that 'CustomType' makes things work, using 'STATIC' gives errors during compilation
    Sub Quicksort_QB64_v7_linesize (ByVal QWORDSoff As _Offset, Byval QWORDSlen As _Offset, Byval Left As _Integer64, Byval Right As _Integer64)
End Declare
'Quicksort_QB64_v7_linesize MhandleOFF.OFFSET, MhandleLEN.OFFSET, 0, ElementsMinusOne
 

For example, do you have a ... nifty (for Linux, Windows) filename parser? A month ago I was tired of not having such a function, and wrote DIRWALKER (the skeleton at the moment, listing AS IT SHOULD the current folder), wanted to write a tiny tool, but the most important work is done, the rest is just gluing into your code. See the BAS+C source code in the attached archive. Wanna have the HEAVY-DUTY functionality of sorting FILESIZE, NAME, MODIFIED TIME fields/columns... for millions of files, FAST!

Programs / Re: A skeleton code for Text Scroller via Drag-and-Drop

« on: March 13, 2022, 08:54:21 pm »

A new showdown is ready - Scandum's crumsort v1.1.5.3 vs Magnetica r.14:

Test run: 2022-Mar-14:
Laptop "Compressionette", Intel 'Kaby Lake' i5-7200U 3.1GHz max turbo, 36GB DDR4 2133MHz:

Code: [Select]

+--------------------+-------------------------+--------------------------+--------------------------+---------------------------+---------------------------+-----------------------------+
| Performer/Keys     | #1, FEW distinct        | #2, MANY distinct        | #3, MANYmore distinct    | #4, ALL distinct          | #5, ALLmore distinct      | #6, ALLmax distinct         |
+--------------------+-------------------------+--------------------------+--------------------------+-------------+-------------+---------------------------+--------------+--------------+
|  Operating System, |   Fedora 35, GCC 11.2.1 |    Fedora 35, GCC 11.2.1 |    Fedora 35, GCC 11.2.1 |     Fedora 35, GCC 11.2.1 |     Fedora 35, GCC 11.2.1 |       Fedora 35, GCC 11.2.1 |
|      Compiler, -O3 |       instructions; IPC |        instructions; IPC |        instructions; IPC |         instructions; IPC |         instructions; IPC |           instructions; IPC |
+--------------------+-------------------------+--------------------------+--------------------------+---------------------------+---------------------------+-----------------------------+
| qsort              |         385/342 seconds |          527/234 seconds |           165/55 seconds |           516/128 seconds |           999/252 seconds |                        N.A. |
|                    | 6,448,450,744,497; 2.82 |  4,921,980,445,033; 2.06 |  1,369,822,972,984; 1.94 |   3,250,596,357,540; 1.58 |   6,250,299,799,611; 1.59 |                        N.A. |
+--------------------+-------------------------+--------------------------+--------------------------+---------------------------+---------------------------+-----------------------------+
| Magnetica v.14     |            29/6 seconds |           198/47 seconds |            86/22 seconds |            241/66 seconds |           472/134 seconds |                        N.A. |
|                    |   171,452,642,615; 1.14 |    936,751,248,598; 1.17 |    443,844,553,192; 1.25 |   1,291,624,832,269; 1.28 |   2,503,942,225,085; 1.28 |                        N.A. |
+--------------------+-------------------------+--------------------------+--------------------------+---------------------------+---------------------------+-----------------------------+
| Bentley-McIlroy    |           38/13 seconds |           223/47 seconds |           100/26 seconds |            304/65 seconds |           597/139 seconds |                        N.A. |
|                    |   246,587,334,436; 1.23 |  1,105,253,285,645; 1.25 |    507,140,963,617; 1.23 |   1,460,132,564,676; 1.22 |   2,848,210,143,410; 1.22 |                        N.A. |
+--------------------+-------------------------+--------------------------+--------------------------+---------------------------+---------------------------+-----------------------------+
| Crumsort 1.1.5.3   |            29/4 seconds |            129/4 seconds |             61/2 seconds |            173/3  seconds |             339/7 seconds |                        N.A. |
|                    |   351,332,884,902; 2.43 |  1,284,147,329,900; 2.80 |    603,065,127,518; 2.77 |   1,605,371,408,284; 2.64 |   3,102,283,088,926; 2.71 |                        N.A. |
+--------------------+-------------------------+--------------------------+--------------------------+---------------------------+---------------------------+-----------------------------+

Legend (The time is exactly the Sort process time, first value is for unsorted, second one is for sorted).

Programs / Re: A skeleton code for Text Scroller via Drag-and-Drop

« on: March 08, 2022, 05:10:05 pm »

A week ago, Scandum released a killer, the current BEST Quicksort, named Crumsort, being the FASTEST unstable, in-place, Quicksort in my tests.

As I see it, he did dive deep in sorting, achieving outstanding results.

Yet, intuitively, I see different approach, which potentially can reach and even surpass his awesome speeds, to be seen...

Currently, the picture of hi-speed sorting is this (in order to reproduce the results, the benchmark package is attached):

Code: [Select]

// Test run: 2022-Mar-08:
// Laptop "Compressionette", Intel 'Kaby Lake' i5-7200U 3.1GHz max turbo, 36GB DDR4 2133MHz:
// +--------------------+---------------------------+---------------------------+---------------------------+---------------------------+----------------------------+-----------------------------+
// | Performer/Keys     | #1, FEW distinct          | #2, MANY distinct         | #3, MANYmore distinct     | #4, ALL distinct          | #5, ALLmore distinct       | #6, ALLmax distinct         |
// +--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------+--------------+--------------+
// |  Operating System, | Windows 10, | Fedora 35,  | Windows 10, | Fedora 35,  | Windows 10, | Fedora 35,  | Windows 10, | Fedora 35,  | Windows 10, | Fedora 35,   | Windows 10,  | Fedora 35,   |
// |      Compiler, -O3 | Intel v15.0 | GCC 11.2.1  | Intel v15.0 | GCC 11.2.1  | Intel v15.0 | GCC 11.2.1  | Intel v15.0 | GCC 11.2.1  | Intel v15.0 | GCC 11.2.1   | Intel v15.0  | GCC 11.2.1   |
// +--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------+--------------+--------------+
// | qsort              |  59 seconds | 377 seconds | 336 seconds | 541 seconds | 157 seconds | 195 seconds | 435 seconds | 534 seconds | 851 seconds | 1036 seconds |         N.A. |         N.A. |
// | Magnetica v.13     |  31 seconds |  30 seconds | 202 seconds | 196 seconds |  88 seconds |  85 seconds | 259 seconds | 250 seconds | 506 seconds |  493 seconds |         N.A. |         N.A. |
// | Bentley-McIlroy    |  38 seconds |  36 seconds | 205 seconds | 208 seconds |  92 seconds |  94 seconds | 279 seconds | 281 seconds | 544 seconds |  553 seconds |         N.A. |         N.A. |
// | Crumsort           |  30 seconds |  32 seconds | 132 seconds | 150 seconds |  64 seconds |  70 seconds | 184 seconds | 192 seconds | 357 seconds |  376 seconds |         N.A. |         N.A. |
// +--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------+--------------+--------------+
// | Best Time (bare    |                           |                           |                           |                           |                            |                             |
// | bone in-place QS): |  30s PARITY               | 132s for Crumsort         |  64s for Crumsort         | 184s for Crumsort         | 357s for Crumsort          | N.A.                        |
// +--------------------+---------------------------+---------------------------+---------------------------+---------------------------+----------------------------+-----------------------------+

Code: [Select]

// +--------------------+---------------------------+---------------------------+---------------------------+---------------------------+----------------------------+-----------------------------+
// | Performer/Keys     | #1, FEW distinct          | #2, MANY distinct         | #3, MANYmore distinct     | #4, ALL distinct          | #5, ALLmore distinct       | #6, ALLmax distinct         |
// +--------------------+---------------------------+---------------------------+---------------------------+---------------------------+----------------------------+-----------------------------+
// |  Operating System, | Fedora 35, GCC 11.2.1     | Fedora 35, GCC 11.2.1     | Fedora 35, GCC 11.2.1     | Fedora 35, GCC 11.2.1     | Fedora 35, GCC 11.2.1      | Fedora 35, GCC 11.2.1       |
// |      Compiler, -O3 | instructions; IPC         | instructions; IPC         | instructions; IPC         | instructions; IPC         | instructions; IPC          | instructions; IPC           |
// +--------------------+---------------------------+---------------------------+---------------------------+---------------------------+----------------------------+-----------------------------+
// | qsort              |   3,302,993,934,921; 2.75 |   2,983,579,082,155; 1.75 |     886,263,153,476; 1.41 |   2,352,769,705,563; 1.38 |    4,527,367,288,814; 1.39 |                        N.A. |
// | Magnetica v.13     |     131,873,917,282; 1.00 |     658,478,895,600; 1.02 |     309,149,594,748; 1.08 |     884,297,729,161; 1.06 |    1,726,931,029,634; 1.08 |                        N.A. |
// | Bentley-McIlroy    |     164,915,835,204; 1.09 |     681,956,155,364; 1.00 |     322,584,009,352; 1.04 |     944,038,959,690; 1.02 |    1,719,825,062,847; 0.97 |                        N.A. |
// | Crumsort           |     312,328,497,447; 2.29 |   1,295,551,817,497; 2.57 |     603,276,911,007; 2.53 |   1,597,685,291,532; 2.46 |    3,091,001,982,856; 2.51 |                        N.A. |
// +--------------------+---------------------------+---------------------------+---------------------------+---------------------------+----------------------------+-----------------------------+

Code: [Select]

// Speed Roster, (the base speed 1.00x is GLIBC's qsort):
// Rank #1: 2683/820=  3.27x =  32+150+ 70+192+ 376=  820 seconds for Crumsort
// Rank #2: 2683/1054= 2.54x =  30+196+ 85+250+ 493= 1054 seconds for Magnetica v.13
// Rank #3: 2683/1172= 2.28x =  36+208+ 94+281+ 553= 1172 seconds for Bentley-McIlroy
// Rank #4: 2683/2683= 1.00x = 377+541+195+534+1036= 2683 seconds for GLIBC's qsort
//
// Legend (The time is exactly the Sort process time):
// #1,FEW = 2,233,861,800 keys, of them distinct = 10; 178,708,944 bytes 22338618_QWORDS.bin; elements = 178,708,944/8 *100; // Keys are 100 times duplicated
// #2,MANY = 2,482,300,900 keys, of them distinct = 2,847,531; 24,823,016 bytes mobythesaurus.txt; elements = 24823016 -8+1; // BuildingBlocks are size-order+1, they are 100 times duplicated
// #3,MANYmore = 1,137,582,073 keys, of them distinct = 77,275,994; 1,137,582,080 bytes linux-5.15.25.tar; elements = 1137582080 -8+1; // BuildingBlocks are size-order+1
// #4,ALL = 2,009,333,753 keys, of them distinct = 1,912,608,132; 2,009,333,760 bytes Fedora-Workstation-Live-x86_64-35-1.2.iso; elements = 2009333760 -8+1; // BuildingBlocks are size-order+1
// #5,ALLmore = 3,803,483,825 keys, of them distinct = 3,346,259,533; 3,803,483,832 bytes Fedora-Workstation-35-1.2.aarch64.raw.xz; elements = 3803483832 -8+1; // BuildingBlocks are size-order+1
// #6,ALLmax = 7,798,235,435 keys, of them distinct = 6,770,144,405; 7,798,235,442 bytes math.stackexchange.com_en_all_2019-02.zim; elements = 7798235442 -8+1; // BuildingBlocks are size-order+1

Code: [Select]

// Notes:
// - Scandum's Crumsort is the FASTEST in-place sorter, known to me, hail Scandum!
// - All the runs were in "Current priority class is REALTIME_PRIORITY_CLASS" for Windows and "Current priority is -20." for Linux;
// To see more stats (the tables were deriving from) see 'log_i5-7200U_MAR08.txt';
// - Benchmark needs 32GB RAM, and 64GB for the 6th testset;
// - The whole package (except the 3rd, 4th, 5th and 6th datasets) is downloadable at:
//   www.sanmayce.com/QS_showdown_r13.zip
// - To reproduce the roster, run on Windows or Linux:
//   - BENCH_ICL32GB.BAT
//   - BENCH_ICL64GB.BAT
//   - sh bench_gcc32GB.sh
//   - sh bench_gcc64GB.sh
// - 3rd dataset is downloadable at:
// https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.15.25.tar.xz
// - 4th dataset is downloadable at:
// https://download.fedoraproject.org/pub/fedora/linux/releases/35/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-35-1.2.iso 
// - 5th dataset is downloadable at:
// https://download.fedoraproject.org/pub/fedora/linux/releases/35/Workstation/aarch64/images/Fedora-Workstation-35-1.2.aarch64.raw.xz 
// - 6th dataset is downloadable at:
// https://download.kiwix.org/zim/stack_exchange/math.stackexchange.com_en_all_2019-02.zim
// - Managed to reduce the mainloop down to 68 bytes (from 71 in r.12), the gain comes from switching to pointers, had to do that long time ago, wanted to keep the etude ARRAY-syntax friendly. Now, r.12 uses arrays, whereas r.13 uses pointers. Another turning point, GCC now beats ICL in all tests.

The current best is at:
https://github.com/scandum/crumsort

Here comes Magnetica v13, a view from the kitchen, this time around, GCC v11.2.1 proves superior to ICL v15.0, generating faster code in all tests:

Code: [Select]

// And the Assembly code generated by GCC 11.2.1 for Magnetica_v13:
// 001 Quicksort_QB64_v13:
// 002 .LFB214:
// 003 	.cfi_startproc
// 004 	pushq	%r15
// 005 	.cfi_def_cfa_offset 16
// 006 	.cfi_offset 15, -16
// 007 	leaq	-8(%rdi,%rsi,8), %r10
// 008 	movq	%rdi, %xmm0
// 009 	movq	%rdi, %rax
// 010 	pushq	%r14
// 011 	.cfi_def_cfa_offset 24
// 012 	.cfi_offset 14, -24
// 013 	movq	%r10, %xmm2
// 014 	movl	$2, %r15d
// 015 	pushq	%r13
// 016 	.cfi_def_cfa_offset 32
// 017 	.cfi_offset 13, -32
// 018 	punpcklqdq	%xmm2, %xmm0
// 019 	pushq	%r12
// 020 	.cfi_def_cfa_offset 40
// 021 	.cfi_offset 12, -40
// 022 	pushq	%rbp
// 023 	.cfi_def_cfa_offset 48
// 024 	.cfi_offset 6, -48
// 025 	pushq	%rbx
// 026 	.cfi_def_cfa_offset 56
// 027 	.cfi_offset 3, -56
// 028 	subq	$143920, %rsp
// 029 	.cfi_def_cfa_offset 143976
// 030 	movups	%xmm0, -96(%rsp)
// 031 	.p2align 4,,10
// 032 	.p2align 3
// 033 .L5115:
// 034 	movq	%r10, %rcx
// 035 	subq	$2, %r15
// 036 	subq	%rax, %rcx
// 037 	testq	%rcx, %rcx
// 038 	jle	.L5096
// 039 	leaq	8(%rax), %rbx
// 040 .L5113:
// 041 	movq	%rcx, %rsi
// 042 	movq	(%rax), %rdx
// 043 	sarq	$3, %rsi
// 044 	cmpq	$55, %rcx
// 045 	ja	.L5097
// 046 	jmp	*.L5099(,%rsi,8)
// 047 	.section	.rodata
// 048 	.align 8
// 049 	.align 4
// 050 .L5099:
// 051 	.quad	.L5097
// 052 	.quad	.L5104
// 053 	.quad	.L5103
// 054 	.quad	.L5102
// 055 	.quad	.L5101
// 056 	.quad	.L5100
// 057 	.quad	.L5098
// 058 	.text
// 059 	.p2align 4,,10
// 060 	.p2align 3
// 061 .L5098:
// 062 	movq	8(%rax), %rbp
// 063 	movq	16(%rax), %rbx
// 064 	xorl	%ecx, %ecx
// 065 	movq	24(%rax), %r11
// 066 	movq	32(%rax), %r10
// 067 	cmpq	%rdx, %rbp
// 068 	movq	40(%rax), %r9
// 069 	movq	48(%rax), %r12
// 070 	setb	%cl
// 071 	cmpq	%rdx, %rbx
// 072 	adcl	$0, %ecx
// 073 	cmpq	%rdx, %r11
// 074 	adcl	$0, %ecx
// 075 	cmpq	%rdx, %r10
// 076 	adcl	$0, %ecx
// 077 	cmpq	%rdx, %r9
// 078 	adcl	$0, %ecx
// 079 	cmpq	%rdx, %r12
// 080 	adcl	$0, %ecx
// 081 	xorl	%r8d, %r8d
// 082 	cmpq	%rdx, %rbp
// 083 	movl	%ecx, -108(%rsp)
// 084 	setnb	%r8b
// 085 	xorl	%ecx, %ecx
// 086 	cmpq	%rbx, %rbp
// 087 	seta	%cl
// 088 	addl	%ecx, %r8d
// 089 	xorl	%ecx, %ecx
// 090 	cmpq	%r11, %rbp
// 091 	seta	%cl
// 092 	addl	%ecx, %r8d
// 093 	xorl	%ecx, %ecx
// 094 	cmpq	%r10, %rbp
// 095 	seta	%cl
// 096 	addl	%ecx, %r8d
// 097 	xorl	%ecx, %ecx
// 098 	cmpq	%r9, %rbp
// 099 	seta	%cl
// 100 	addl	%ecx, %r8d
// 101 	xorl	%ecx, %ecx
// 102 	cmpq	%r12, %rbp
// 103 	seta	%cl
// 104 	xorl	%edi, %edi
// 105 	addl	%ecx, %r8d
// 106 	cmpq	%rdx, %rbx
// 107 	setnb	%dil
// 108 	xorl	%ecx, %ecx
// 109 	cmpq	%rbx, %rbp
// 110 	setbe	%cl
// 111 	addl	%ecx, %edi
// 112 	xorl	%ecx, %ecx
// 113 	cmpq	%r11, %rbx
// 114 	seta	%cl
// 115 	addl	%ecx, %edi
// 116 	xorl	%ecx, %ecx
// 117 	cmpq	%r10, %rbx
// 118 	seta	%cl
// 119 	addl	%ecx, %edi
// 120 	xorl	%ecx, %ecx
// 121 	cmpq	%r9, %rbx
// 122 	seta	%cl
// 123 	addl	%ecx, %edi
// 124 	xorl	%ecx, %ecx
// 125 	cmpq	%r12, %rbx
// 126 	seta	%cl
// 127 	xorl	%esi, %esi
// 128 	addl	%ecx, %edi
// 129 	cmpq	%rdx, %r11
// 130 	setnb	%sil
// 131 	xorl	%ecx, %ecx
// 132 	cmpq	%r11, %rbp
// 133 	setbe	%cl
// 134 	addl	%ecx, %esi
// 135 	xorl	%ecx, %ecx
// 136 	cmpq	%r11, %rbx
// 137 	setbe	%cl
// 138 	addl	%ecx, %esi
// 139 	xorl	%ecx, %ecx
// 140 	cmpq	%r10, %r11
// 141 	seta	%cl
// 142 	addl	%ecx, %esi
// 143 	xorl	%ecx, %ecx
// 144 	cmpq	%r9, %r11
// 145 	seta	%cl
// 146 	addl	%ecx, %esi
// 147 	xorl	%ecx, %ecx
// 148 	cmpq	%r12, %r11
// 149 	seta	%cl
// 150 	addl	%ecx, %esi
// 151 	xorl	%ecx, %ecx
// 152 	cmpq	%rdx, %r10
// 153 	setnb	%cl
// 154 	xorl	%r13d, %r13d
// 155 	cmpq	%r10, %rbp
// 156 	setbe	%r13b
// 157 	addl	%r13d, %ecx
// 158 	xorl	%r13d, %r13d
// 159 	cmpq	%r10, %rbx
// 160 	setbe	%r13b
// 161 	addl	%r13d, %ecx
// 162 	xorl	%r13d, %r13d
// 163 	cmpq	%r10, %r11
// 164 	setbe	%r13b
// 165 	addl	%r13d, %ecx
// 166 	xorl	%r13d, %r13d
// 167 	cmpq	%r9, %r10
// 168 	seta	%r13b
// 169 	addl	%r13d, %ecx
// 170 	xorl	%r13d, %r13d
// 171 	cmpq	%r12, %r10
// 172 	seta	%r13b
// 173 	xorl	%r14d, %r14d
// 174 	addl	%r13d, %ecx
// 175 	cmpq	%rdx, %r9
// 176 	setnb	%r14b
// 177 	xorl	%r13d, %r13d
// 178 	cmpq	%r9, %rbp
// 179 	setbe	%r13b
// 180 	addl	%r14d, %r13d
// 181 	xorl	%r14d, %r14d
// 182 	cmpq	%r9, %rbx
// 183 	setbe	%r14b
// 184 	addl	%r13d, %r14d
// 185 	xorl	%r13d, %r13d
// 186 	cmpq	%r9, %r11
// 187 	setbe	%r13b
// 188 	addl	%r14d, %r13d
// 189 	xorl	%r14d, %r14d
// 190 	cmpq	%r9, %r10
// 191 	setbe	%r14b
// 192 	addl	%r13d, %r14d
// 193 	xorl	%r13d, %r13d
// 194 	cmpq	%r12, %r9
// 195 	seta	%r13b
// 196 	addl	%r14d, %r13d
// 197 	movslq	-108(%rsp), %r14
// 198 	movq	%rdx, (%rax,%r14,8)
// 199 	movslq	%r8d, %rdx
// 200 	movl	-108(%rsp), %r14d
// 201 	movq	%rbp, (%rax,%rdx,8)
// 202 	movslq	%edi, %rdx
// 203 	movq	%rbx, (%rax,%rdx,8)
// 204 	movslq	%esi, %rdx
// 205 	movq	%r11, (%rax,%rdx,8)
// 206 	movslq	%ecx, %rdx
// 207 	movq	%r10, (%rax,%rdx,8)
// 208 	movslq	%r13d, %rdx
// 209 	movq	%r9, (%rax,%rdx,8)
// 210 	leal	(%r14,%r8), %edx
// 211 	addl	%edi, %edx
// 212 	addl	%esi, %edx
// 213 	addl	%ecx, %edx
// 214 	movl	$21, %ecx
// 215 	addl	%r13d, %edx
// 216 	subl	%edx, %ecx
// 217 	movslq	%ecx, %rdx
// 218 	movq	%r12, (%rax,%rdx,8)
// 219 	.p2align 4,,10
// 220 	.p2align 3
// 221 .L5096:
// 222 	testq	%r15, %r15
// 223 	je	.L5095
// 224 .L5121:
// 225 	movq	-104(%rsp,%r15,8), %r10
// 226 	movq	-112(%rsp,%r15,8), %rax
// 227 	jmp	.L5115
// 228 	.p2align 4,,10
// 229 	.p2align 3
// 230 .L5103:
// 231 	movq	8(%rax), %rdi
// 232 	movq	16(%rax), %r8
// 233 	xorl	%ecx, %ecx
// 234 	cmpq	%rdx, %rdi
// 235 	setb	%cl
// 236 	cmpq	%rdx, %r8
// 237 	adcl	$0, %ecx
// 238 	xorl	%esi, %esi
// 239 	cmpq	%rdx, %rdi
// 240 	setnb	%sil
// 241 	xorl	%r9d, %r9d
// 242 	cmpq	%r8, %rdi
// 243 	seta	%r9b
// 244 	addl	%r9d, %esi
// 245 	movslq	%ecx, %r9
// 246 	movq	%rdx, (%rax,%r9,8)
// 247 	movslq	%esi, %rdx
// 248 	addl	%esi, %ecx
// 249 	movq	%rdi, (%rax,%rdx,8)
// 250 	movl	$3, %edx
// 251 	subl	%ecx, %edx
// 252 	movslq	%edx, %rdx
// 253 	movq	%r8, (%rax,%rdx,8)
// 254 	testq	%r15, %r15
// 255 	jne	.L5121
// 256 .L5095:
// 257 	addq	$143920, %rsp
// 258 	.cfi_remember_state
// 259 	.cfi_def_cfa_offset 56
// 260 	popq	%rbx
// 261 	.cfi_def_cfa_offset 48
// 262 	popq	%rbp
// 263 	.cfi_def_cfa_offset 40
// 264 	popq	%r12
// 265 	.cfi_def_cfa_offset 32
// 266 	popq	%r13
// 267 	.cfi_def_cfa_offset 24
// 268 	popq	%r14
// 269 	.cfi_def_cfa_offset 16
// 270 	popq	%r15
// 271 	.cfi_def_cfa_offset 8
// 272 	ret
// 273 	.p2align 4,,10
// 274 	.p2align 3
// 275 .L5104:
// 276 	.cfi_restore_state
// 277 	movq	8(%rax), %rcx
// 278 	cmpq	%rdx, %rcx
// 279 	sbbq	%rsi, %rsi
// 280 	andl	$8, %esi
// 281 	cmpq	%rdx, %rcx
// 282 	movq	%rdx, (%rax,%rsi)
// 283 	movl	$1, %edx
// 284 	sbbl	$0, %edx
// 285 	movslq	%edx, %rdx
// 286 	movq	%rcx, (%rax,%rdx,8)
// 287 	jmp	.L5096
// 288 	.p2align 4,,10
// 289 	.p2align 3
// 290 .L5101:
// 291 	movq	8(%rax), %r10
// 292 	movq	16(%rax), %r9
// 293 	xorl	%ecx, %ecx
// 294 	movq	24(%rax), %r8
// 295 	movq	32(%rax), %rdi
// 296 	cmpq	%rdx, %r10
// 297 	setb	%cl
// 298 	cmpq	%rdx, %r9
// 299 	adcl	$0, %ecx
// 300 	cmpq	%rdx, %r8
// 301 	adcl	$0, %ecx
// 302 	cmpq	%rdx, %rdi
// 303 	adcl	$0, %ecx
// 304 	xorl	%esi, %esi
// 305 	cmpq	%rdx, %r10
// 306 	setnb	%sil
// 307 	xorl	%r11d, %r11d
// 308 	cmpq	%r9, %r10
// 309 	seta	%r11b
// 310 	addl	%r11d, %esi
// 311 	xorl	%r11d, %r11d
// 312 	cmpq	%r8, %r10
// 313 	seta	%r11b
// 314 	addl	%r11d, %esi
// 315 	xorl	%r11d, %r11d
// 316 	cmpq	%rdi, %r10
// 317 	seta	%r11b
// 318 	addl	%esi, %r11d
// 319 	xorl	%esi, %esi
// 320 	cmpq	%rdx, %r9
// 321 	setnb	%sil
// 322 	xorl	%ebx, %ebx
// 323 	cmpq	%r9, %r10
// 324 	setbe	%bl
// 325 	addl	%ebx, %esi
// 326 	xorl	%ebx, %ebx
// 327 	cmpq	%r8, %r9
// 328 	seta	%bl
// 329 	addl	%ebx, %esi
// 330 	xorl	%ebx, %ebx
// 331 	cmpq	%rdi, %r9
// 332 	seta	%bl
// 333 	addl	%esi, %ebx
// 334 	xorl	%esi, %esi
// 335 	cmpq	%rdx, %r8
// 336 	setnb	%sil
// 337 	xorl	%ebp, %ebp
// 338 	cmpq	%r8, %r10
// 339 	setbe	%bpl
// 340 	addl	%ebp, %esi
// 341 	xorl	%ebp, %ebp
// 342 	cmpq	%r8, %r9
// 343 	setbe	%bpl
// 344 	addl	%ebp, %esi
// 345 	xorl	%ebp, %ebp
// 346 	cmpq	%rdi, %r8
// 347 	seta	%bpl
// 348 	addl	%ebp, %esi
// 349 	movslq	%ecx, %rbp
// 350 	addl	%r11d, %ecx
// 351 	movq	%rdx, (%rax,%rbp,8)
// 352 	movslq	%r11d, %rdx
// 353 	addl	%ebx, %ecx
// 354 	movq	%r10, (%rax,%rdx,8)
// 355 	movslq	%ebx, %rdx
// 356 	addl	%esi, %ecx
// 357 	movq	%r9, (%rax,%rdx,8)
// 358 	movslq	%esi, %rdx
// 359 	movq	%r8, (%rax,%rdx,8)
// 360 	movl	$10, %edx
// 361 	subl	%ecx, %edx
// 362 	movslq	%edx, %rdx
// 363 	movq	%rdi, (%rax,%rdx,8)
// 364 	jmp	.L5096
// 365 	.p2align 4,,10
// 366 	.p2align 3
// 367 .L5102:
// 368 	movq	8(%rax), %r10
// 369 	movq	16(%rax), %r9
// 370 	xorl	%ecx, %ecx
// 371 	movq	24(%rax), %r8
// 372 	cmpq	%rdx, %r10
// 373 	setb	%cl
// 374 	cmpq	%rdx, %r9
// 375 	adcl	$0, %ecx
// 376 	cmpq	%rdx, %r8
// 377 	adcl	$0, %ecx
// 378 	xorl	%edi, %edi
// 379 	cmpq	%rdx, %r10
// 380 	setnb	%dil
// 381 	xorl	%esi, %esi
// 382 	cmpq	%r9, %r10
// 383 	seta	%sil
// 384 	addl	%esi, %edi
// 385 	xorl	%esi, %esi
// 386 	cmpq	%r8, %r10
// 387 	seta	%sil
// 388 	addl	%esi, %edi
// 389 	xorl	%esi, %esi
// 390 	cmpq	%rdx, %r9
// 391 	setnb	%sil
// 392 	xorl	%r11d, %r11d
// 393 	cmpq	%r9, %r10
// 394 	setbe	%r11b
// 395 	addl	%r11d, %esi
// 396 	xorl	%r11d, %r11d
// 397 	cmpq	%r8, %r9
// 398 	seta	%r11b
// 399 	addl	%r11d, %esi
// 400 	movslq	%ecx, %r11
// 401 	addl	%edi, %ecx
// 402 	movq	%rdx, (%rax,%r11,8)
// 403 	movslq	%edi, %rdx
// 404 	addl	%esi, %ecx
// 405 	movq	%r10, (%rax,%rdx,8)
// 406 	movslq	%esi, %rdx
// 407 	movq	%r9, (%rax,%rdx,8)
// 408 	movl	$6, %edx
// 409 	subl	%ecx, %edx
// 410 	movslq	%edx, %rdx
// 411 	movq	%r8, (%rax,%rdx,8)
// 412 	jmp	.L5096
// 413 	.p2align 4,,10
// 414 	.p2align 3
// 415 .L5100:
// 416 	movq	8(%rax), %r12
// 417 	movq	16(%rax), %rbp
// 418 	xorl	%ecx, %ecx
// 419 	movq	24(%rax), %r11
// 420 	movq	32(%rax), %r10
// 421 	cmpq	%rdx, %r12
// 422 	movq	40(%rax), %rbx
// 423 	setb	%cl
// 424 	cmpq	%rdx, %rbp
// 425 	adcl	$0, %ecx
// 426 	cmpq	%rdx, %r11
// 427 	adcl	$0, %ecx
// 428 	cmpq	%rdx, %r10
// 429 	adcl	$0, %ecx
// 430 	cmpq	%rdx, %rbx
// 431 	adcl	$0, %ecx
// 432 	xorl	%r9d, %r9d
// 433 	cmpq	%rdx, %r12
// 434 	setnb	%r9b
// 435 	xorl	%esi, %esi
// 436 	cmpq	%rbp, %r12
// 437 	seta	%sil
// 438 	addl	%esi, %r9d
// 439 	xorl	%esi, %esi
// 440 	cmpq	%r11, %r12
// 441 	seta	%sil
// 442 	addl	%esi, %r9d
// 443 	xorl	%esi, %esi
// 444 	cmpq	%r10, %r12
// 445 	seta	%sil
// 446 	addl	%esi, %r9d
// 447 	xorl	%esi, %esi
// 448 	cmpq	%rbx, %r12
// 449 	seta	%sil
// 450 	xorl	%r8d, %r8d
// 451 	addl	%esi, %r9d
// 452 	cmpq	%rdx, %rbp
// 453 	setnb	%r8b
// 454 	xorl	%esi, %esi
// 455 	cmpq	%rbp, %r12
// 456 	setbe	%sil
// 457 	addl	%esi, %r8d
// 458 	xorl	%esi, %esi
// 459 	cmpq	%r11, %rbp
// 460 	seta	%sil
// 461 	addl	%esi, %r8d
// 462 	xorl	%esi, %esi
// 463 	cmpq	%r10, %rbp
// 464 	seta	%sil
// 465 	addl	%esi, %r8d
// 466 	xorl	%esi, %esi
// 467 	cmpq	%rbx, %rbp
// 468 	seta	%sil
// 469 	xorl	%edi, %edi
// 470 	addl	%esi, %r8d
// 471 	cmpq	%rdx, %r11
// 472 	setnb	%dil
// 473 	xorl	%esi, %esi
// 474 	cmpq	%r11, %r12
// 475 	setbe	%sil
// 476 	addl	%esi, %edi
// 477 	xorl	%esi, %esi
// 478 	cmpq	%r11, %rbp
// 479 	setbe	%sil
// 480 	addl	%esi, %edi
// 481 	xorl	%esi, %esi
// 482 	cmpq	%r10, %r11
// 483 	seta	%sil
// 484 	addl	%esi, %edi
// 485 	xorl	%esi, %esi
// 486 	cmpq	%rbx, %r11
// 487 	seta	%sil
// 488 	addl	%esi, %edi
// 489 	xorl	%esi, %esi
// 490 	cmpq	%rdx, %r10
// 491 	setnb	%sil
// 492 	xorl	%r13d, %r13d
// 493 	cmpq	%r10, %r12
// 494 	setbe	%r13b
// 495 	addl	%r13d, %esi
// 496 	xorl	%r13d, %r13d
// 497 	cmpq	%r10, %rbp
// 498 	setbe	%r13b
// 499 	addl	%r13d, %esi
// 500 	xorl	%r13d, %r13d
// 501 	cmpq	%r10, %r11
// 502 	setbe	%r13b
// 503 	addl	%r13d, %esi
// 504 	xorl	%r13d, %r13d
// 505 	cmpq	%rbx, %r10
// 506 	seta	%r13b
// 507 	addl	%r13d, %esi
// 508 	movslq	%ecx, %r13
// 509 	movq	%rdx, (%rax,%r13,8)
// 510 	movslq	%r9d, %rdx
// 511 	movq	%r12, (%rax,%rdx,8)
// 512 	movslq	%r8d, %rdx
// 513 	movq	%rbp, (%rax,%rdx,8)
// 514 	movslq	%edi, %rdx
// 515 	movq	%r11, (%rax,%rdx,8)
// 516 	movslq	%esi, %rdx
// 517 	movq	%r10, (%rax,%rdx,8)
// 518 	leal	(%rcx,%r9), %edx
// 519 	movl	$15, %ecx
// 520 	addl	%r8d, %edx
// 521 	addl	%edi, %edx
// 522 	addl	%esi, %edx
// 523 	subl	%edx, %ecx
// 524 	movslq	%ecx, %rdx
// 525 	movq	%rbx, (%rax,%rdx,8)
// 526 	jmp	.L5096
// 527 .L5097:
// 528 	sarq	$2, %rsi
// 529 	leaq	(%rax,%rsi,8), %rcx
// 530 	movq	(%rcx), %r8
// 531 	movq	%rdx, (%rcx)
// 532 	movq	%r8, (%rax)
// 533 	cmpq	%rax, %r10
// 534 	jbe	.L5105
// 535 	movq	%rax, %rdx
// 536 	movq	%rax, %r9
// 537 	movq	%r10, %rcx
// 538 	jmp	.L5111
// 539 	.p2align 4,,10
// 540 	.p2align 3
// 541 .L5122:
// 542 	movq	(%r9), %rdi
// 543 	movq	%rsi, (%r9)
// 544 	leaq	16(%rdx), %rsi
// 545 	addq	$8, %r9
// 546 	movq	%rdi, 8(%rdx)
// 547 	movq	%r11, %rdx
// 548 .L5107:
// 549 	cmpq	%rdx, %rcx
// 550 	jbe	.L5112
// 551 .L5111:
// 552 	movq	8(%rdx), %rsi
// 553 	leaq	8(%rdx), %r11
// 554 	cmpq	%r8, %rsi
// 555 	jb	.L5122
// 556 	ja	.L5120
// 557 	leaq	16(%rdx), %rsi
// 558 	movq	%r11, %rdx
// 559 	cmpq	%rdx, %rcx
// 560 	ja	.L5111
// 561 .L5112:
// 562 	movq	8(%rdx), %rdi
// 563 	movq	%rsi, %xmm0
// 564 	movq	%rcx, %rsi
// 565 	movq	%r10, %xmm1
// 566 	subq	%rdx, %rsi
// 567 	leaq	-8(%r9), %r10
// 568 	punpcklqdq	%xmm1, %xmm0
// 569 	addq	$2, %r15
// 570 	movq	%rdi, %r8
// 571 	sarq	$63, %rsi
// 572 	subq	8(%rcx), %r8
// 573 	movups	%xmm0, -112(%rsp,%r15,8)
// 574 	andq	%r8, %rsi
// 575 	subq	%rsi, %rdi
// 576 	movq	%rdi, 8(%rdx)
// 577 	addq	%rsi, 8(%rcx)
// 578 	movq	%r10, %rcx
// 579 	subq	%rax, %rcx
// 580 	testq	%rcx, %rcx
// 581 	jg	.L5113
// 582 	jmp	.L5096
// 583 	.p2align 4,,10
// 584 	.p2align 3
// 585 .L5110:
// 586 	subq	$8, %rcx
// 587 .L5120:
// 588 	movq	(%rcx), %rdi
// 589 	cmpq	%r8, %rdi
// 590 	ja	.L5110
// 591 	movq	%rdi, 8(%rdx)
// 592 	subq	$8, %rcx
// 593 	movq	%rsi, 8(%rcx)
// 594 	movq	%r11, %rsi
// 595 	jmp	.L5107
// 596 	.p2align 4,,10
// 597 	.p2align 3
// 598 .L5105:
// 599 	movq	%rbx, %rsi
// 600 	movq	%rax, %rdx
// 601 	movq	%rax, %r9
// 602 	movq	%r10, %rcx
// 603 	jmp	.L5112
// 604 	.cfi_endproc

Tomorrow night, will run the GCC and ICL executables on Windows 10, the testmachine will be 'Brutalitto' - the mini-monster with Zen 2 4800H and 64GB DDR4 - running the heaviest sort benchmark on Internet - 7+ billion QWORDS...

Add-on:

Code: [Select]

Test run: 2022-Mar-09:
Laptop "Brutalitto", AMD 'Renoir' 4800H 4.3GHz max turbo, 64GB DDR4 3200MHz:
+--------------------+---------------------------+---------------------------+---------------------------+---------------------------+----------------------------+-----------------------------+
| Performer/Keys     | #1, FEW distinct          | #2, MANY distinct         | #3, MANYmore distinct     | #4, ALL distinct          | #5, ALLmore distinct       | #6, ALLmax distinct         |
+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------+--------------+--------------+
|  Operating System, | Windows 10, | Windows 10, | Windows 10, | Windows 10, | Windows 10, | Windows 10, | Windows 10, | Windows 10, | Windows 10, | Windows 10,  | Windows 10,  | Windows 10,  |
|      Compiler, -O3 | Intel v15.0 | GCC 11.2.1  | Intel v15.0 | GCC 11.2.1  | Intel v15.0 | GCC 11.2.1  | Intel v15.0 | GCC 11.2.1  | Intel v15.0 | GCC 11.2.1   | Intel v15.0  | GCC 11.2.1   |
+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------+--------------+--------------+
| qsort              |  42 seconds |  45 seconds | 242 seconds | 280 seconds | 113 seconds | 131 seconds | 319 seconds | 354 seconds | 620 seconds |  695 seconds | 1282 seconds | 1438 seconds |
| Magnetica v.13     |  22 seconds |  21 seconds | 135 seconds | 134 seconds |  60 seconds |  59 seconds | 177 seconds | 177 seconds | 350 seconds |  349 seconds |  724 seconds |  722 seconds |
| Bentley-McIlroy    |  24 seconds |  24 seconds | 146 seconds | 142 seconds |  66 seconds |  64 seconds | 200 seconds | 193 seconds | 391 seconds |  376 seconds |  850 seconds |  803 seconds |
| Crumsort           |  20 seconds |  19 seconds |  91 seconds |  81 seconds |  44 seconds |  38 seconds | 126 seconds | 109 seconds | 246 seconds |  211 seconds |  567 seconds |  479 seconds |
+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------+--------------+--------------+
| Best Time (bare    |                           |                           |                           |                           |                            |                             |
| bone in-place QS): |  19s for Crumsort         |  81s for Crumsort         |  38s for Crumsort         | 109s for Crumsort         | 211s for Crumsort          | 479s for Crumsort           |
+--------------------+---------------------------+---------------------------+---------------------------+---------------------------+----------------------------+-----------------------------+

Code: [Select]

Speed Roster, (the base speed 1.00x is GLIBC's qsort):
Rank #1: 2943/937=  3.14x = 19+ 81+ 38+109+211+ 479=  937 seconds for Crumsort
Rank #2: 2943/1462= 2.01x = 21+134+ 59+177+349+ 722= 1462 seconds for Magnetica v.13
Rank #3: 2943/1602= 1.83x = 24+142+ 64+193+376+ 803= 1602 seconds for Bentley-McIlroy
Rank #4: 2943/2943= 1.00x = 45+280+131+354+695+1438= 2943 seconds for GLIBC's qsort

Programs / Re: A skeleton code for Text Scroller via Drag-and-Drop

« on: February 22, 2022, 03:52:53 am »

Glad to share the latest-n-fastest Quicksort around - Magnetica r.5 - the source code is attached.

I spotted a conditional swap within the mainloop, now, in r.5 this branch is removed and put outside the hottest loop.

So, r.5 is algorithmically superior compared to r.4, the performance varies between AMD and Intel, between ICL and GCC, overall it pushes the limits once more.

Behold the simplicity itself:

Code: QB64: [Select]

// Magnetica r.5BB:
/*
; mark_description "Intel(R) C++ Compiler XE for applications running on Intel(R) 64, Version 15.0.0.108 Build 20140726";
; mark_description "-S -O3 -D_N_HIGH_PRIORITY";
 
Quicksort_QB64_v9       PROC 
; parameter 1: rcx
; parameter 2: rdx
; parameter 3: r8
.B12.1::                        ; Preds .B12.0
        push      r13                                           ;1172.72
        push      r14                                           ;1172.72
        push      r15                                           ;1172.72
        push      rbp                                           ;1172.72
        mov       eax, 800024                                   ;1172.72
        call      __chkstk                                      ;1172.72
        sub       rsp, 800024                                   ;1172.72
        mov       eax, 2                                        ;1191.5
        mov       QWORD PTR [40+rsp], rdx                       ;1190.17
        mov       QWORD PTR [48+rsp], r8                        ;1191.17
                                
.B12.2::                        ; Preds .B12.20 .B12.1
        mov       rdx, QWORD PTR [32+rsp+rax*8]                 ;1193.17
        mov       r13, QWORD PTR [24+rsp+rax*8]                 ;1194.16
        add       rax, -2                                       ;1195.31
        cmp       r13, rdx                                      ;1197.47
        jge       .B12.20       ; Prob 10%                      ;1197.47
                                
.B12.3::                        ; Preds .B12.2
        mov       r9, r13                                       ;1199.13
                                
.B12.4::                        ; Preds .B12.18 .B12.3
        mov       r10, QWORD PTR [rcx+r13*8]                    ;1229.13
        lea       r15, QWORD PTR [r13+rdx]                      ;1229.48
        sar       r15, 1                                        ;1229.56
        mov       r14, r13                                      ;1200.13
        mov       r11, rdx                                      ;1198.13
        mov       rbp, r9                                       ;1199.13
        mov       r9, r14                                       ;1200.13
        mov       r8, QWORD PTR [rcx+r15*8]                     ;1229.13
        mov       QWORD PTR [rcx+r15*8], r10                    ;1229.13
        mov       QWORD PTR [rcx+r13*8], r8                     ;1229.13
                                
.B12.5::                        ; Preds .B12.4 .B12.13
        inc       r14                                           ;1282.27
        mov       r10, QWORD PTR [rcx+r14*8]                    ;1283.29
        cmp       r8, r10                                       ;1283.29
        ja        .B12.12       ; Prob 22%                      ;1283.29
                                
.B12.6::                        ; Preds .B12.5
        jae       .B12.13       ; Prob 50%                      ;1287.36
                                
.B12.7::                        ; Preds .B12.6
        mov       r15, QWORD PTR [rcx+r11*8]                    ;1288.35
        cmp       r8, r15                                       ;1288.35
        jae       .B12.11       ; Prob 10%                      ;1288.35
                                
.B12.9::                        ; Preds .B12.7 .B12.9
        dec       r11                                           ;1289.39
        mov       r15, QWORD PTR [rcx+r11*8]                    ;1288.35
        cmp       r8, r15                                       ;1288.35
        jb        .B12.9        ; Prob 82%                      ;1288.35
                                
.B12.11::                       ; Preds .B12.9 .B12.7
        mov       QWORD PTR [rcx+r14*8], r15                    ;1291.21
        dec       r14                                           ;1296.31
        mov       QWORD PTR [rcx+r11*8], r10                    ;1291.21
        dec       r11                                           ;1295.35
        jmp       .B12.13       ; Prob 100%                     ;1295.35
                                
.B12.12::                       ; Preds .B12.5
        mov       r15, QWORD PTR [rcx+rbp*8]                    ;1284.21
        mov       QWORD PTR [rcx+rbp*8], r10                    ;1284.21
        inc       rbp                                           ;1285.31
        mov       QWORD PTR [rcx+r14*8], r15                    ;1284.21
                                
.B12.13::                       ; Preds .B12.12 .B12.11 .B12.6
        cmp       r14, r11                                      ;1281.24
        jl        .B12.5        ; Prob 82%                      ;1281.24
                                
.B12.14::                       ; Preds .B12.13
        jle       .B12.16       ; Prob 78%                      ;1299.22
                                
.B12.15::                       ; Preds .B12.14
        mov       r8, QWORD PTR [8+rcx+r11*8]                   ;1300.21
        mov       r10, QWORD PTR [8+rcx+r14*8]                  ;1300.21
        mov       QWORD PTR [8+rcx+r14*8], r8                   ;1300.21
        mov       QWORD PTR [8+rcx+r11*8], r10                  ;1300.21
                                
.B12.16::                       ; Preds .B12.14 .B12.15
        inc       r14                                           ;1655.25
        cmp       rdx, r14                                      ;1730.49
        jle       .B12.18       ; Prob 62%                      ;1730.49
                                
.B12.17::                       ; Preds .B12.16
        add       rax, 2                                        ;1731.39
        mov       QWORD PTR [24+rsp+rax*8], r14                 ;1732.17
        mov       QWORD PTR [32+rsp+rax*8], rdx                 ;1733.17
                                
.B12.18::                       ; Preds .B12.17 .B12.16
        lea       rdx, QWORD PTR [-1+rbp]                       ;1654.25
        cmp       r13, rdx                                      ;1197.47
        jl        .B12.4        ; Prob 82%                      ;1197.47
                                
.B12.20::                       ; Preds .B12.18 .B12.2
        DB        15                                            ;1743.28
        DB        31                                            ;1743.28
        DB        64                                            ;1743.28
        DB        0                                             ;1743.28
        DB        15                                            ;1743.28
        DB        31                                            ;1743.28
        DB        128                                           ;1743.28
        DB        0                                             ;1743.28
        DB        0                                             ;1743.28
        DB        0                                             ;1743.28
        DB        0                                             ;1743.28
        test      rax, rax                                      ;1743.28
        jne       .B12.2        ; Prob 99%                      ;1743.28
                                
.B12.21::                       ; Preds .B12.20
        add       rsp, 800024                                   ;1753.1
        pop       rbp                                           ;1753.1
        pop       r15                                           ;1753.1
        pop       r14                                           ;1753.1
        pop       r13                                           ;1753.1
        ret                                                     ;1753.1
        ALIGN     16
                                
.B12.22::
; mark_end;
Quicksort_QB64_v9 ENDP
*/

In order gladly to say QB64 stands for "Quickness" once again, my wish is QB64 community to have the qsm.h header (built-in function to QB64 library) outperforming both qsort() C implementations of Intel and GNU, enjoy.

Programs / e-pi-ALTES - the QB64 audio Visualizer - huge audiobooks Player

« on: January 11, 2022, 02:36:38 pm »

Here comes the dedicated homethread of Ефиалти a.k.a. epiALTES - the Linux/Windows GUI audio application allowing simplistic navigation through your audio tree while visualizing the magnitude of the frequencies and the dynamic range of both channels:

As the thread title goes, the playing of .WAV files bigger than 4GB is supported, the header limits the size to 4 bytes, however e-pi-ALTES has two modes - when the file is bigger than 1GB (arbitrarily chosen) it switches to RAW mode which disregards the header and reads ON-THE-FLY (without loading the file into RAM), thus enabling 4+GB support. For instance, I downloaded an YouTube video, converted/extracted the audio (with VLC) and for a 32+hours long audio it became 20GB (try to navigate in those 100+ thousand seconds with [Shift+|Ctrl+]Arrows, the skip step is 1s/10s/100s), the funny thing is that neither KMP/POT player, nor Winamp and even VLC can play the 20GB .WAV - they stupidly read only some 5 hours of it:

The case-insensitive filter (shown at the bottom left corner) is applied instantaneously, the allowed keys are 'Backspace', 'Enter' and all letters, for scrolling: Up/Down/Home/End/PgUp/PgDn:

The idea is to discard all unnecessary window pop-ups (as I saw Winamp doing) and lower the key pressings to the maximum minimum. In next revisions I intend to add mouse support similar to the Masakari code, then the left hand will deal with the filtering whereas the right rolling the wheel and clicking the third button instead of pressing 'Enter'... Often, I wonder why the hell the skilled coders cannot see how nifty the simplicism is, inhere, not stressing the eyes with different colors/fonts/windows is a key to ... easy on the eyes GUI. Along with simplistic approach, as a "by-product" comes the file-explorer-like list of your (usually) highly nested tree of files, seeing the "fullpath" gives the feeling of ... accessing library.

The .ZIP package contains .EXE and .ELF i.e. the executables for Linux and Windows (32bit and 64bit).
Just extract the package in the parent folder of all your folders containing .MP3 and .WAV i.e. in the root of the tree of your phonotheque:

Make a link/shortcut to the .exe or .elf, place it on your desktop and enjoy!

Suggestions, feedback or critics are welcome... the idea is one unique player to be written in QB64 language showcasing the BASIC i.e. SIMPLISTIC approach to audio.

Edit, 2022-Jan-14:
Updated, fixed the path error by adding quotes, the bug happened for paths with spaces. Still, if no .MP3s or .WAVs in the current tree then an error will be thrown.

Programs / Re: Get frequency in sound using _Vince algorithm

« on: December 31, 2021, 12:19:34 am »

Glad to share the latest and finest e-pi-ALTES, it features the "snowfall" effect:

The rewind (backwards or forwards) can be made with Left/Right arrow as well.
The volume Up/Down can be made with Up/Down arrow as well.

Also, did some screen-resolution-aware actions, on old laptops (with 1366x768) e-pi-ALTES enforces the 8x8 Toshiba font, on FHD (1920x1080) laptops it enforces another superb Japanese font but 6x12, my calculations are in the source code as comments as to how to preserve undistorted scaling when you go FULL-SCREEN, hope it looks good.

I strongly recommend using e-pi-ALTES when listening to audiobooks or Japanese audio narration, the later are AWESOMELY done, visualization comes ... falling, as in the superb song ending the classic film "Roadhouse", it goes "The night comes falling".

I love this very revision, for the first time I see what I hear, hee-hee.

Having compiled it on Linux Fedora 35, here comes the promo video (this is how old laptops will look like, in 1680 or 1366 modes):

Have a nice new year!

Programs / Re: A skeleton code for Text Scroller via Drag-and-Drop

« on: December 26, 2021, 11:48:14 pm »

I was able to speed up a little (~300ms, down to 1,677 ms) the spell-check rate by checking only the words with equal length.
That is, the previous stupid checking of each word against the whole 1,000,000+ Binary-Search pool now was limited only to THE fraction of the pool where the corresponding length is located - which lowers the number of seeks/comparisons.

Additionally, all the UNFAMILIAR words, now, are dumped as an index, but placed in the very beginning, in order of appearance.
Thanks to that feature, one is allowed quickly to spot misspelled words, as I did only with a few glances, 2 buggy "daughters" found:
daughers
daughers

Project Gutenberg failed to deliver quality, obviously the correctors still rely on non-automatic methods.

Finally, we have a decent benchmark utilizing the Binary-Search in a speedy manner, it is a good start for all kind of experiments since it is creating as a result the .HTM counterpart (with all unfamiliar words in BOLD, all OED distinct words are used as a spell-check wordlist) of our .TXT file:

It throws all the words of KJV Bible at OED 2nd edition, which is 700,000+ words against the 1,000,000+ OED words:

The spotted errors are to go along with the entire .HTM into .DOC|.PDF file for hardcopy i.e. paper reviewing, with good old pencils:

Spell-checking KJV Bible in 1.677 seconds, can we do better?

On Linux, the newer GCC (v10.2.1) generates 100ms faster code, lowering the conversion down to:

Anyway, this TXT2HTM code will be part of incoming Masakari r.9, enabling having spell-checked .HTMs only a key and few seconds away...

The source code and binaries for Linux and Windows (also the OED wordlist) are in the attached package.

Programs / Re: Get frequency in sound using _Vince algorithm

« on: December 26, 2021, 04:11:16 am »

Glad to share the first operational revision, the QB64 source and the Windows executable (64bit, -O3 -mavx2) are in the attached file.
In order to play a WAV/MP3, just give the name as a parameter on the command line. Use quotes if necessary.

My i5-7200U struggled bigtime when changing the FFT samples from 1024 to 4096, at 8192 the tearing is nasty.
I have an idea to precalculate (in next revisions) all the 8192 samples long chunk and to JUST draw the magnitudes.

Also, I encountered a nasty noise-like problem in the decoded stream (bug?!), many sequences with unnatural high values making the visual output blinking badly, what is this, anyone?!

EDIT, 2021-Dec-29:
The bug was fixed, now the revision 2 is fully functional. I hate when don't have enough time to finish something, anyway, found it, the problem was in reading the decompressed stream with at offsets not divisible by 2, duh.

I tested e-pi-ALTES with Japanese narration (Sekirei, Ninja Resurrection) - the visualization is as it should - the vocal range is well presented, some nice bell shapes in the Cyan zone the 1KHz-5KHz, to me visualizer of the magnitudes of frequencies lacking in showing a RICH presence in vocal range is of no use, at least to me.

Visualizing the male vocal of the demon advisor Mori:

You may press Alt+Enter to toggle fullscreen/windowed modes, on my i5-7200U the transforming speed is at 25 frames-per-second for the highest FFT chunk - 8192 samples, it can vary between 128, 256, 512, 1024, 2048, 4096, 8192:

As always, the source code is in the attached package, also 32bit and 64bit executables for Windows.

Enjoy!

Programs / Re: text file browser

« on: December 24, 2021, 01:23:44 am »

Hi Colonel_Panic,
in case of not seeing my text browser, you are welcome to chop whatever part you like from my Masakari source.

If you discard the unwanted parts, or rather salvage the useful ones, you can end up with even more simplistic viewer - which is always nice.

Programs / Re: Get frequency in sound using _Vince algorithm

« on: December 24, 2021, 01:10:58 am »

Glad to share the first DRAFT revision,
QB64 source plus the 32bit binary are attached.

The jumping of bars is too fast (even at 30 Frames-Per-Second), but this is RAW as it gets, for smoothing the animation some tweaks are needed, no idea at the moment what.

Critical feedback is welcome, also suggestions what can be bettered.

Programs / Re: Binary Sequence Predictor "game"?

« on: December 23, 2021, 03:30:25 am »

Interesting, some time ago a fellow member posted some word-guesser, my suggestion is to try something new, namely "Letter-Guesser".

Instead of zeros and ones, the game to allow writing letters in a string (it could be more space effective if the written string is placed in a box instead of a line), and in REAL-TIME, heh-heh, to guess the next letter, thus whether the previous input was words making sense, or not, it would be an useful ... auto-completion etude.

I have in my to-do list to write a phrase-guesser, deriving from millions of bi-grams (as a start), in next year.

For a long time I wanted to have (and share) a simple application allowing typing (in kinda search box) English words, and the word being currently typed to be PAIRED with the previous one, thus forming a bi-gram being matched versus some big-ass bi-gram corpus. Then when, for example, I type "Sylvester St" the predictor to suggest few bi-grams, thus people who are unaware how the family name of the beloved actor is, would be saved from the shame.
Here is what Google suggests when typing "Sy":

Oh, and to boost its practicalness, if the user started on a wrong foot e.g. "Silvestor St" then the predictor to be ready to fire the correct "Sylvester Stallone" - that is to be capable of detecting "errors" in the word before last one as well.

Programs / Re: Fast Fourier Transform

« on: December 23, 2021, 02:17:43 am »

Many thanks for sharing!
If I am successful, in next days will share my BAR-JUMPER - to enable visual feedback to the MP3/WAV files...

Programs / Re: Get frequency in sound using _Vince algorithm

« on: December 23, 2021, 12:32:42 am »

Love seeing those jumping bars, I am eager to announce what is on my mind - e-pi-ALTES - the QB64 Audio_(Frequency_Spectrum)_Visualizer:

Three things inspired me to name my bar-jumper after the famous traitor:
- To honor Euler with 'e', to honor 'π' as well, as they are the basics of DFT (Discrete-Fourier-Transform);
- The etymology being "one who leaps upon";
- The superb dialogue in '300' movie, emphasizing the "beating heart", where the heart is the audio and spear/shield/sword are different frequencies:

Ephialtes : My father trained me to feel no fear to make spear and shield and sword as much a part of me as my own beating heart!
King Leonidas : [Ephialtes shows King Leonidas his thrust; it's good and the King is surprisingly impressed] A fine thrust.

Indeed those jumping bars resemble the thrust of Epialtes' spear.

Note: Some write 'ph' and pronounce it as 'f', however the Greek historian Herodotus spelled it as Ἐπιάλτης, if you listen to the movie, the traitor himself pronounces P not F.

Programs / Re: Get frequency in sound using _Vince algorithm

« on: December 23, 2021, 12:19:58 am »

Quote from: _vince on January 17, 2021, 12:35:01 pm

Nice, looks interesting. I'll have to update QB64 at some point to get the new memsound. I wish I would get more involved in this but I haven't had much interest in qb64 recently. Btw, I did not invent these algorithms, it's decades old math translated into qb64.

Hi Vince and Petr,
thanks for sharing this exciting etude.

Wanna write (with help and feedback from the community) a decent (at least not crappy) MP3/WAV frequency visualizer, will share my code in next days, at the moment I have a draft. The main goal is to have the jumping bars corresponding to the actual frequency peaks, to be not only synchronized but to drop no "frames".

By the way, @_vince , what was the source you used, can you share the link.
If your code fails to deliver, my intention is to write my own DFT etude, not FFT though - just the simplicity itself.

QB64 Discussion / Please make the return of original QuickBASIC 'Ctrl+Y' shortcut

« on: December 19, 2021, 11:58:04 pm »

Speaking of QB64 IDE functionality, please consider adding the legacy shortcut 'Ctrl+Y' in future releases.
I love current 2.01, but it comes short everytime I need to put the current line into clipboard while removing it, too often I used it in the past, truly useful it is.

Seeing that currently 'Ctrl+Y' is mapped onto REDO, I have a suggestion: remove REDO altogether while mimicking the Photoshop scheme:

[ You are not allowed to view this attachment ]

UNDO - Ctrl+Z
Step Forward - Shift+Ctrl+Z
Step Backward - Alt+Ctrl+Z

Pages: [1] 2 3 ... 5

News:

Show Posts

Messages - Sanmayce

Programs / Re: A skeleton code for Text Scroller via Drag-and-Drop

Programs / Re: A skeleton code for Text Scroller via Drag-and-Drop

Programs / Re: A skeleton code for Text Scroller via Drag-and-Drop

Programs / Re: A skeleton code for Text Scroller via Drag-and-Drop

Programs / e-pi-ALTES - the QB64 audio Visualizer - huge audiobooks Player

Programs / Re: Get frequency in sound using _Vince algorithm

Programs / Re: A skeleton code for Text Scroller via Drag-and-Drop

Programs / Re: Get frequency in sound using _Vince algorithm

Programs / Re: text file browser

Programs / Re: Get frequency in sound using _Vince algorithm

Programs / Re: Binary Sequence Predictor "game"?

Programs / Re: Fast Fourier Transform

Programs / Re: Get frequency in sound using _Vince algorithm

Programs / Re: Get frequency in sound using _Vince algorithm

QB64 Discussion / Please make the return of original QuickBASIC 'Ctrl+Y' shortcut