In preparation for my QB64 coding for 128+bit math - I have run a number of modules, each being on purpose similar in coding style (essential difference between modules is the variable elements used and the memory allocated for same).
The objective was to simulate INC (increment, +1) but with the eventual aim of scalability (i.e. to go beyond 128 bits). The modules all were scaled down to 64 bits only (to allow in the near future easy cross-checking with INTEGER64 standard code) and as tested were not used for anything useful (i.e. an "empty" FOR NEXT loop would be the QB64 code equivalent). The test results (shown below) were for 16 INC's in succession and the "trial" was repeated 128 times.
The methodology used was purely bit-wise (i.e. base 2) - and consequently one bit only at a time was stored in the respective variable elements, for instance an INTEGER64 array stored only one bit value per array element (~2% memory efficient).
The overall aim was to determine which element variable and memory allocation combination would produce distinctive calculation time gains.
In the process of this development I gained some experience in using MEM blocks and UDT.
[ You are not allowed to view this attachment ]
Notes
In the tabulated values section, inc = 0 should read inc = 16
The horizontal axis is a linear time scale (0 to 3 seconds)
The vertical axis is a linear count (one count per pixel) for the respective modules.
Some modules are known to be in error
() designates QB64 arrays being used - one element per bit
memory and screen are the memory allocations as per using MEM coding
screen_0 and screen_2 are the old DOS memory allocations (&HB800 and &HA000 addresses) and NOT being used by MEM
udt as per user defined type
direct means that allocation unit is "hard-coded" (i.e. a03% variable used instead of a%(03) array element)
TIMER_empty_loops is actually a module without any code in the loop (here 16x) - to estimate overhead in timing of modules
TIMER_null is two time calls difference (i.e the resolution of the timing process for two successive enquiries)
The tabulated values represent minimum, average, maximum timings for the collection of 128 trials per module
The associated Histogram shows all data points above a line of half-intensity (line = span minimum to maximum) - no data point is repeated on the exact same pixel (and always above the line).
The program was run without computer connection to internet, devices etc or essentially any other software running.
All modules are essentially the same coding style, only the variable elements used differ - the "heart" of each module - INC - was only 3 lines of functional QB64 code equivalent per iteration.
All modules were tested in "SLOW" mode, i.e. to roughly simulate in actual useage - so for instance at each iteration of INC, the binary string representation was displayed as text on a graphics screen, select case block used to redirect as necessary depending on a particular bit field value.
The timings (unit = second) do not include any setup overheads (zeroing etc) - and assumed the initial value (for the INC) was preloaded and valid.
Eventually it is planned to use resources as mentioned/written by
@SMcNeill ,
@luke,
@bplus and
@jack as the program develops past INC to the more common modes. These other resources would be used to cross-check my results.