Author Topic: Confused about size of compiled code: removing most code increases compiled size  (Read 4223 times)

0 Members and 1 Guest are viewing this topic.

Offline hanness

  • Forum Regular
  • Posts: 210
    • View Profile
I am using the Aug 24th dev release of 64-bit QB64 on Windows 10, 21H1.

I have a program that has approximately 11,200 lines of code.

The program has a menu allowing the user to select from a list of quite a few different operations to perform.

I wanted to take just one of those operations and break it out into a separate standalone program. The new program is basically just a cut and paste of one section from the original program with the following changes:

1) There are MANY variables used in the original program that I no longer needed, so I removed the DIM statements for all the now unused variables and arrays. Note that I also did not add a single new variable or array.

2) I did add few PRINT statements to the new code, but we are talking maybe 30 lines total.

3) The code has been cut down from about 11,200 lines to roughly 1,900 lines.

4) The menu has been removed since the program is now a single function program.

5) Any subroutines and procedures not needed by the abreviated code have been removed.

So here is the odd thing that prompted my question:

The original 11,200+ line code compiles to 3.23MB in size.
The 1,900 lines of code compile to 4.71 MB.

Neither program is compiled with $DEBUG turned on. As far as I can recall, I have not made any other changes.

Is there some way that I can determine why the drastically smaller program actually compiles to a larger size?

NOTE: This is not at all critical, but it does have me curious.

FellippeHeitor

  • Guest
Libraries that get compiled per program when certain commands are used: sound, fonts, printer, images. Does the new program include any of these that didn't exist in the main program?

Notice that even a BEEP will include the whole sound library, for example.

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
My guess would be this: “2) I did add few PRINT statements to the new code, but we are talking maybe 30 lines total.”

Print is a terrible beast of a command — look at all it does carefully and try and write all its functionality into a command of your own sometime, and you’ll get an idea of how complex it is.  Print letters and numbers, separated by comma spacing, new lines, and nothing, with support for USING to format output….

Try this:

PRINT A; B; c; d; e; f; g; h; i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;z

One print statement which prints the value of 26 variables beside each other…

One line of BAS code…

Now go into internal/temp/main.txt and see how that single line gets translated.  My guess would be it’ll create almost 300 lines of c-code.  (It’s been some time since I last looked, so that’s honestly just a guess, but it’s truly close to what my memories tells me it’ll become.)

For another test, copy and paste that single line into the IDE for 1000 times.  (Paste it 10 times, then copy those 10, paste them 10 times, then copy those 100 and paste them 10 times.)

Try and compile 1000 lines of PRINT identical to the one above…

CONGRATULATIONS!!!  You just broke g++!!  The compiler won’t compile your translated c-program as it’s now 3million lines or so, and g++ will run out of memory/stack space trying to compile it!

ONLY a 1000 line BAS program, but translated, it’s more than c can handle!



Just for curiosity’s sake, remark out the print statements, and see how that affects the size.  Without the actual code to look at, my first guess is the addition of PRINT being added.
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline hanness

  • Forum Regular
  • Posts: 210
    • View Profile
I'll try your suggestion, but I highly doubt that is the issue. Yes - I did add about 30 print statements, but I also probably eliminated several hundred print statements when the code was cut down from 11,000+ lines to about 1,900 lines.

Thanks for the suggestions.

Offline Cobalt

  • QB64 Developer
  • Forum Resident
  • Posts: 878
  • At 60 I become highly radioactive!
    • View Profile
CONGRATULATIONS!!!  You just broke g++!!  The compiler won’t compile your translated c-program as it’s now 3million lines or so, and g++ will run out of memory/stack space trying to compile it!

ONLY a 1000 line BAS program, but translated, it’s more than c can handle!

I remember back on .NET with our one eccentric friend, who liked to print each and every variable and character on its own PRINT line, you made up a test program of like 4k prints that would not compile. I think while playing around I found the max normal compile would handle around 2200 or 2400 lines, though with some trickery I was able to compile up to something like 3350.(cant remember exact numbers after so long)

All that overhead on the C++ side really kills things. I wana say the translated C was something like 250k or 300k lines.
Really becomes an eye opener on just how important reasonable coding practices are.
Granted after becoming radioactive I only have a half-life!

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
If you took out more PRINT statements than you added, my second question would be: Did you add any fonts to the code for readability?  Use of _LOADFONT or _FONT ends up loading the truetype libraries, and they’d make up for the difference in size for you.

If it’s not a PRINT or _FONT type increase, then all I can say is, “Share the code for us to diagnose differences”.  Just working in the dark makes it hard to do much more than make educated guesses based on past experiences, but those experiences might not actually match your situation at all.  ;)
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline bartok

  • Newbie
  • Posts: 80
    • View Profile
This topic leads me to pose a question that for a professional programmer is surely very silly, but it is in my mind for a long time.
The first basic language of the computer is the assembly, isn't it? I don't know at all how assembly comunicates with the "core" of the PC, as also the assembly is a "language" that the computer translates in its own language, that I vaguely know is a sort of sequence of 1 and 0, even if I suppose that they are not really written as "1", and "0", as also the characters "1" and "0" are a construction intended to be understood by a human.
However, any other language, as R and QB64 for example, are in a upper level compared to assembly, because any command is a program itself, as said for PRINT. So, I undestand that there is a "program" behind INPUT, in order to have the command INPUT working as INPUT. So, I suppose that INPUT command activates a range of passages that finally arrives to the assembly code of INPUT, that the computer is able to undestand.
Ok, and here there is my question. Before my use of QB64, I thought that C++ was the same: a language in the same "level" of QB64. I have also a book dedicated to learn to program in C++. So, in my ignorance, I thought that the installed program of C++, or the istalled program of QB64, have somewhere in their "system" directories all the necessary to translate a code into assembly. But, when I press F5 in a QB64 code, if I'm not wrong, the code is translated into C++, not directly into assembly. So, what transform the QB64code--> C++code--> into assembly? Why the passage between QB64 and C++ and not directly in a more basic level of communication whith the computer? It was the same with the old Qbasic? There are other passages between C++ and assembly?

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
When dealing with assembly, everything is machine specific.  QBASIC used to translate directly between BAS and executable, but it *only* worked on DOS based systems.  Every version of basic came with a page of PEEK/POKE memory addresses that were specific to each machine.  Apple BASIC isn’t the same as GW Basic and isn’t the same as TRS-80 Basic…

QB64 works to make itself as cross-platform compatible as possible, and to do this, it *doesn’t* translate BAS to EXE; instead it translates BAS to C.  By using g++ to compile the translated code, QB64 uses the localized version of g++ on your machine to generate the proper executable for your system.   

All we do is turn your BAS code into CPP code, and then g++ takes care of the rest of the work for us to create a suitable EXE for your OS/system.
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline hanness

  • Forum Regular
  • Posts: 210
    • View Profile
So, this is interesting. I added some functionality to my program. Basically, I integrated a user manual and help system into the program. The exact same changes were made to both programs. When I say exact, I do mean exact! In fact I copied from one and pasted to the other. Bottom line is that I added probably well over 1,000 new print statements, and maybe 30 or so input statements. The compiled size of the original program with more lines of code increased by maybe 100K or so (rough number). Interestingly, the program with the far fewer lines of code actually DECREASED in compiled size by about 1MB!

The result is that the problem is now resolved. The program with fewer lines of code is now smaller than the other program, as I had originally expected. It was never a critical issue, more curiosity, but it sure seems odd to me that adding 1,000+ lines of code would make the compiled program shrink by about 20% of its original size!

In case you are wondering - no, I did not remove any code. I only added code. The program does nothing at all fancy. It runs in a console Windows, gathers input from a user, displays information to the screen (absolutely no fonts are referenced, simply using whatever the default is), and we run a LOT of shell commands as we manipulate files using Windows command line tools. One run of this program can take 8 hours easily on a fast PC with hellafast SSDs, but that's because we are waiting on the commands run via SHELL to complete and we processing large amounts of data.

Offline Cobalt

  • QB64 Developer
  • Forum Resident
  • Posts: 878
  • At 60 I become highly radioactive!
    • View Profile
The take away is, unless your trying to put your program on a 3 1\2" or 5 1\4" floppy(or their multitude of ancestors) you probably don't need to worry about the final size of your exe.

As OSes become bloated, so wont the overhead code needed to run a simple program.
Granted after becoming radioactive I only have a half-life!

Offline bartok

  • Newbie
  • Posts: 80
    • View Profile
When dealing with assembly, everything is machine specific.  QBASIC used to translate directly between BAS and executable, but it *only* worked on DOS based systems.  Every version of basic came with a page of PEEK/POKE memory addresses that were specific to each machine.  Apple BASIC isn’t the same as GW Basic and isn’t the same as TRS-80 Basic…

QB64 works to make itself as cross-platform compatible as possible, and to do this, it *doesn’t* translate BAS to EXE; instead it translates BAS to C.  By using g++ to compile the translated code, QB64 uses the localized version of g++ on your machine to generate the proper executable for your system.   

All we do is turn your BAS code into CPP code, and then g++ takes care of the rest of the work for us to create a suitable EXE for your OS/system.

thank's. And where I can find the g++ compiler on the PC?


FellippeHeitor

  • Guest
If you're on Windows, QB64 ships it under internal/c/c_compiler