Author Topic: Looking for old program or help recreating it (Read 90263 times)

random1 · « **Reply #105 on:** January 11, 2022, 03:12:52 pm »

STxAxTIC

LOL, I had to drag out another portable table and make a u-shaped configuration.
It was too time consuming closing the lids so I could get to the back row to make
adjustments. Overall the test yielded nothing, after 5 hours the best was around
90% and the overall average was in the high 70's. Going to have to try something
else.

I hate to bother you with another request but today I came up with another idea
for tuning. If interested the attached file contains comma delimited entries of
the raw outputs of the 3 predictors plus a 4th that shows the overall hit shows for
both 0's and 1,s and the last entry in each line is the correct prediction value.

This data is compiled by taking the 3 predictors output probabilities and then using
P^2, and nothing else. The data contains 100 test with a blank line at the end of
each run to separate them. I used 40 strings in the test so each run contains 40
results.

The first 4 columns are the (0) output probabilities, the second 4 are for the (1's).
The last entry is the actual value that showed in the next update. The lines that
have all 1'a are the strings that are made up of all zeros, ie, no ones in the string
being processed.

The idea is to try to find a small range that could be added at the end of each
prediction represented by each row. As it is now I add the 3 predictors outputs
and then divide by 3 for both 0's and 1's then take whichever is higher and use
that value as the prediction.

Here is one of the best results from yesterday's test. I was hoping for high 90's or maybe
even 100% No such luck

Test # 64
Total Strings = 40
[P1 =18] [Hits =14] [Misses = 4]
[P0 =22] [Hits =22] [Misses = 0]
Actual 1's =14 H-Ratio =1
Actual 0's =26 H-Ratio =.8461
Overall hits=36 of 40
Predictors score =.9
1's Missed on string/s ->12,27,31,38
0's Missed on string/s ->
A=0001-0111-0000-0011-0001-0011-0001-0001-0001-0011
P=0001-0111-0001-0011-0001-0011-0011-0011-0001-0111
End of prediction.............

The goal of the file below would be to use it as a training set in order to calculate lower and
upper threshold range that can be applied in the final stage to tip the output in the direction
so to increase the overall hits. This data was calculated without my so called boost weights.

Anyway, if you have the time to mess with it, otherwise please don't feel any obligation. Just
throwing it out there. I am going to calculate a running average then calculate a tipping value
which would be the value needed to off set the predicted enough to swing the prediction in
favor of the correct value. Sounds simple enough but may need a new training set after each
update.

File image, explains the data structure.

https://i.postimg.cc/zDdmYrTx/help.png

random1 · « **Reply #106 on:** January 12, 2022, 05:51:09 pm »

Hi all

I was getting a bit discouraged with my project and decided to do a random test.
I set the prng to generate a integer between 0 and 100 and then used mod 2 to
get a odd / even value, even=0 and odd<>0 =1. I ran 100 test and it's best hit
rate was 70% with a overall average or just over 50%.

The project's best so far is in the mid 90's with a overall average of .775. The overall
is showing around 27 points higher on average, maybe I am making more progress
than I thought.

R1

STxAxTIC · « **Reply #107 on:** January 12, 2022, 06:04:38 pm »

Hey random1,

I've been trying to cook up a response but have been swamped with all other kinds of self-induced distractions. You brought up a lot of different things and I'm admittedly way far behind. Everyone else is still in last year with respect to this discussion, I think. It's just you and me.

I'm trying to understand what's happening in your latest update. Are you saying that you used QB64 to generate random data, and then through your models, you can get prediction scores in the 70s? How long are these samples? Because when I do this, I get 50% plus or minus a standard deviation. See why I'm picking on this discrepancy though? Something's responsible for 70%. What are those details?

Since the details are really piling on - and maybe you'll benefit from doing this the same way I did - are you able to write up (in some preservable format) what's happening on your end? Would it be too much for me to say "I'll take the answer to my question off the air, and instead wait for your book"? Just kidding...

random1 · « **Reply #108 on:** January 13, 2022, 10:28:28 am »

STxAxTIC

It's my fault because I have been jumping around all over the place with respect to
just about everything. I am working with several different data-sets and with two
different predictors.

Ok, for clarity I will stick to one data set attached file below named Temp.txt. This file
contains 40 strings each with 750 entries or data-points.

What I did for the random test was to use my prng function that's old as the hills and
needs updated. It uses a time based seed with a few extra things thrown in to try and
make it more random.

For the test mentioned in that post I simply added the prng at the end of the prediction
process. What it does is ignore the predicted value and generates a random value. The
Prng generates a value from 1 to 100 which I will call (r). After the value is generated
I use r=r mod 2. If r > 0 then the prediction is (1), if r=(0) then the prediction is 0.

Because the prng is time based I allow the predictors to run I just swap out the prediction
with the randomly generated value. I could have used Rand(0,1) to get the value but
by allowing it to first generate a value between 1 and 100 then using mod to get a odd
even value it seems to be more random. The plan is to swap out the old prng with the
one you posted, just haven't got to it yet.

In the random test the prng managed one 70% correct prediction across the entire 40 string
run. Most however were in the high 40's but some were as low as 30% correct. With the
prng I am using the test results were expected. The overall average gotten from storing the
individual results and then dividing to get a average came out to be something like 50%. Since
working with 0 or 1 as the prediction I would think the results "were as" expected.

I have been playing around with the 3 in 1 predictor I mentioned early in the topic, specifically
with blending outputs into a single prediction via playing with the weights. I have two weights
for each predictor, one of which is a sort of static boost that is used when processing strings where
zeros outnumber the ones by a large margin. Without these boost-weights the zero biased strings
will always be predicted (0). Probably not a good method, just another rabbit hole grasping a straws
to try and improve the overall.

There is a forth value that is generated for each string which is the population of 0's and 1's within
the entire string being analyzed. I haven't figured how to use it but think it might be processed into
the main predictions to help with the zero biased strings.

The best test score for the 3 in 1 predictor was one 38 out of 40 or a 95% correct. Being the
stupid monkey that I am, I did not save the configuration settings for that run. My best score
since then has been been 92.5%, 37 out of 40 correct. These 90% plus hits are few and far
between, sad to say. In general, running 100 test I end up with an average overall score of
77% and a best of 90%. Very few show better than 90%

It seems that I have hit a wall and can't fine tune beyond my current level. I do however have
hope that I can somehow include the 4th value, ie, the overall counts for both 0's and 1's into
the prediction and push it a little higher.

If I compare the random test vs the 3 in 1 results then it does seem that I have made some
progress but it turns out it's not enough to meet my needs.

A couple post back I posted a file that contains the raw outputs for each of the 3 in 1 predictors
plus the 4th overall 0 and 1 counts. This was generated to allow me to try and fine tune the
predictor without running it. The first four values are the 0's values, the second 4 are the 1's.
The last entry is the correct value, ie, the value that should be predicted.

The file is comma delimited which makes it easy to load the 3 in 1 predictors outputs which I then
play around with trying to configure the weights in such a way to increase the hits. Doing a 40
string 100 event test takes several minutes but doing it this way, it takes less than a second when
just working with the predictors raw outputs.

I did find your thoughts on dissipation of patterns interesting. Patterns emerge, run for some
extended length and then dissolve for some unknown length of time only to reappear. Will they
start up again and if so when; that's the million dollar question. Lets say we are working with a
6 digit binary pattern. The pattern has finite arrangements of 000000 to 111111 6^2=64. If
dealing with random strings one would expect each to appear 1 in 64 on average +/- some small
standard deviation.

Whenever we see patterns repeating at a greater than expected average then it seems that some
thing else may be going on that's not easily derived. Also the lack of some patterns adds to the
dilemma.

Anyway this is getting long so I will close.

R1

random1 · « **Reply #109 on:** January 13, 2022, 11:50:12 am »

STxAxTIC

I am posting a test from the strings posted above. The attached pic shows the Mal-9K
predictor tool so that you can see the user settings and the overall results which are
the last line in the lower window.

The user settings are divided by predictors, ie, the 3 in 1 tool. The first column is for
predictor 1, the second for predictor 2 etc..

The top entries control the string length for each predictor.
The second and third cells in the first column control the length of the pattern, low/high alphas
So for this setup it uses lengths of 4 to 5 digits.

Bwt's = the static boost weights for all 3 predictors

Lm1 to lm3 are _limit values in the main iteration loops used only to make the left process bars
look normal. These are set to 7000 for speed reasons, makes the bar graph unreadable, just
flickers.

The lowest rows first cell set the number of back-test, the 2nd sets the blending method and
the third activates the predictors. It allows setting active any or all three prediction tool active.
Here it is set to use all 3.

The second column 2nd cell sets the sample size and the 3rd sets the iterations for predictor 2

The second and 3rd values in the third column sets the 3rd predictors gap start and finish. Here
it is set to just look at every 10th to every 60th value in the string. The predictor first goes through'
the string counting only every 10th entry then every 11th etc up to gaps of 60. Very simple tool.

https://i.postimg.cc/B6QSbPD1/test-run.png

The save button causes the program to write the predictor test data to a file. These are the two
attached files below.

The first file shows the raw data output with the boost value added. The second shows the stats
for the 100 test predictions.

Anyway, just throwing this out if your interested. Might help you understand my post. I have a
place dug out for your predictor once your totally finished with it. Waiting patiently

R1

random1 · « **Reply #110 on:** January 13, 2022, 11:59:36 am »

P.S.

In the test above it's set to add the boost weight to the predictors (0) output values.
which is part of blending method 1, blending method 2 adds the boost to the 1's output.
Just wanted to clear that up. There are 5 different blending methods of which #5 is the
random test option. The blending methods can also be combined but I have never tested
that option yet.

R1

random1 · « **Reply #111 on:** January 13, 2022, 02:50:13 pm »

Here's something interesting, for a while now I have noticed when scrolling through the
mal.txt file that many of the predictors missed (1) predictions fall on every 4th string, ie
4,8,12,16,20,24,28,32,36,40. This got me to thinking that I need to add a reevaluation
algorithm for these strings, a second look so to say that's based on some other type of
analysis whenever a (1) is predicted for these strings. Maybe do the same thing for
other strings that produce the most missed (0) predictions.

https://i.postimg.cc/xqVJxr62/t44.png

R1

random1 · « **Reply #112 on:** January 15, 2022, 12:59:21 pm »

STxAxTIC

I think I have reached the end and although the overall results are better than I expected
they fall short of my needs. The best result for a single 40 string run have been 38 of 40
correct for a hit rate of 95% but the overall average for 100 back-test runs has been .77%

I think I have exhausted my options for the predictors I am working on and don't see anyway
to improve it beyond it"s current rate. Predictors 1 and 2 closely mimic the one you built
with minor differences so unless you find something magical to add then I think we can call
it finished.

Anyway, thanks for everything.

R1

News:

Author Topic: Looking for old program or help recreating it (Read 90263 times)

random1

Re: Looking for old program or help recreating it

random1

Re: Looking for old program or help recreating it

STxAxTIC

Re: Looking for old program or help recreating it

random1

Re: Looking for old program or help recreating it

random1

Re: Looking for old program or help recreating it

random1

Re: Looking for old program or help recreating it

random1

Re: Looking for old program or help recreating it

random1

Re: Looking for old program or help recreating it