Author Topic: Thinking outside of the box of QB64, HTML or JAVA involved (Read 6875 times)

doppler · « **on:** November 18, 2018, 08:12:51 am »

Let me start with this first: At no time is pron involved.

I visit a website I like. It's got lots of Manga pages. The pages are PNG, GIF and JPG. Typical! What is not typical is changes they did to help them with HTML on the fly page encoding. In the past there coding had ever thing I needed to see all the material I wanted to WGET. Then they got smart. They recoded to a stock webpage form with JAVA downloaded variables. This allows them to reduce the server load. Now I have to capture and enter two variables. Even then I don't have it all.

Here is the question: Involves HTML coding aka, Greasemonkey type. I need to pass out of the webpage the two variables I need to speed thing up in my QB64 program. I know which variables I want, because I can see them in the webpage and via "web inspector" using Firefox. Right now it's way too many clicks and mouse moves to use inspector.

This may not be possible for some of the (anime reference) "S-Class" programmers here in the forum.
So is there a way to pull water out of this stone?

SMcNeill · « **Reply #1 on:** November 18, 2018, 08:45:12 am »

Which manga reader? Mangafox? Goodmanga? Mangakakalot?

It sounds like you're trying to use QB64 to view the images, and are looking for a way to read the volume/chapter info so you can pass it and get the manga pages?

If that's the basic issue, it's probably just a case of selective parsing, which shouldn't be too hard to do, as long as you can point us to a selected page to use as a reference tool to work from.

doppler · « **Reply #2 on:** November 18, 2018, 09:11:27 am »

The reader in question mangadex.

Not using qb64 to view the images. It's being used to create the scripts and WGET list required to pull the images to me off the server.
Then I can just use a simple offline viewer (my PC). By downloading the page HTML, and parsing it for the good stuff. I still need
to know the root page of the chapters content. That's done via a clipboard capture of the first image source location. Java variable #1.
Then I have to hand enter the number of pages total. Java variable #2.

I am pretty sure all variables are pullable out of webpage java download. And likely the other variable of the filename too (in java array).
The Mangadex reader is very good, but I don't always have the time to read online. Download and read offline is easier for me.

Not trying to re-invent the wheel, just transforming it from rough round to smooth round.
I already have something created that's usable. But it's labor intensive. I am by nature a lazy f*ck.

SMcNeill · « **Reply #3 on:** November 18, 2018, 09:33:13 am »

I'll grab a few pages later this evening and see about trying to get those variables. It doesn't seem like it should be too hard a thing to do.

SMcNeill · « **Reply #4 on:** November 18, 2018, 09:46:38 am »

Taking a quick look at the pages, the first manga I found was this one: https://mangadex.org/title/581/ouroboros

Isn't it just a simple case of reading the HTML page and basically grabbing the first link in it? (In this case, https://mangadex.org/chapter/474650 )

Once you have the chapter, images start at 1 and increase sequentially up from there.

https://mangadex.org/chapter/474650/1
https://mangadex.org/chapter/474650/2
https://mangadex.org/chapter/474650/3

Once you get past the last page, it simply defaults back to the original title page for that series. (https://mangadex.org/title/581/ouroboros)

doppler · « **Reply #5 on:** November 18, 2018, 10:28:05 am »

I start by getting the link location: https://mangadex.org/chapter/484683 Then use the browser to display the first page.

By getting "copy image location" in browser I capture: https://mangadex.org/data/b6f07e5c7b7794e39105e5c8b4e14a7c/a1.png

From this I find the server root page its: b6f07e5c7b7794e39105e5c8b4e14a7c

The filename: a1.png Without knowing the number pages I have to input that total. Then my WGET list becomes a text file of
https://mangadex.org/data/b6f07e5c7b7794e39105e5c8b4e14a7c/a1.png and a2.png, a3.png

The total pages is just that inside the java variable call "Total-pages". The other vaiables of filename and root page location are there too.

It becomes a problem for me when the first page is PNG or JPG, then all the other pages are the other type. So my list has to be massaged.
After the WGET gives a fail code exit. I look in the captured directory contents and get the other missing types. My coding still needs
more work, because I didn't include other types like GIF and JPEG. I don't want to spend time until I get the JAVA variables under control.
Then I can re-code to fix it all. And likely all the correct filenames in one shot too.

Series I like I save and re-read. Something like "kitsune spirit" comes to mind.

Pete · « **Reply #6 on:** November 18, 2018, 12:33:39 pm »

I curious how you are getting any results from WGET from secure (https) pages. My experience with WGET is that it fails in general with https, but does a remarkably good job with http sites. I've noticed some issue with jpgs, pngs, etc., as you too seem to have found. For these reasons, I switched to cURL, but you have to parse out links to direct cURL to the pages you want to harvest. A neat feature of WGET is it can grab an entire site.

Oh, and quit downloading "pron" whatever the hell that is. (Reread your original post. :)

Pete

doppler · « **Reply #7 on:** November 18, 2018, 12:56:30 pm »

Well Pete it must have been awhile since your last use of Wget. Https support has been in wget for sometime now. I think as far back as 1.16

The latest version is 1.19. There are way to many options to list. Most noticeable is setting recursion depth and specifing the refer'er.
With wget I can go as deep as I want and fake who I am plus where I started from (ie:refer'er)

Anything a website has been setup to allow. Wget can access it.

Pete · « **Reply #8 on:** November 18, 2018, 02:36:18 pm »

It really hasn't been that long, maybe a year? Anyway, I can't access my Win 7 now, but when I get it up and running again, I'll check the version. I just downloaded v1.194, and I'll give that ago. It would be great to access https sites. I mean it's fun to make parsers and easy to use cURL with QB64 but it is also nice to have an all purpose site grabber. I nearly made one in QB64 last year but it is not as well organized as WGET results.

Thanks,

Pete

News:

Author Topic: Thinking outside of the box of QB64, HTML or JAVA involved (Read 6875 times)

doppler

Thinking outside of the box of QB64, HTML or JAVA involved

SMcNeill

Re: Thinking outside of the box of QB64, HTML or JAVA involved

doppler

Re: Thinking outside of the box of QB64, HTML or JAVA involved

SMcNeill

Re: Thinking outside of the box of QB64, HTML or JAVA involved

SMcNeill

Re: Thinking outside of the box of QB64, HTML or JAVA involved

doppler

Re: Thinking outside of the box of QB64, HTML or JAVA involved

Pete

Re: Thinking outside of the box of QB64, HTML or JAVA involved

doppler

Re: Thinking outside of the box of QB64, HTML or JAVA involved

Pete

Re: Thinking outside of the box of QB64, HTML or JAVA involved