Active Topics

 


Reply
Thread Tools
Posts: 99 | Thanked: 36 times | Joined on Mar 2010
#1
I was wondering if there's anyway to download a whole thread so I can read it offline. I know I can do a page at a time.

Thanks
 
dr_frost_dk's Avatar
Posts: 1,503 | Thanked: 2,688 times | Joined on Oct 2010 @ Denmark
#2
i would also want this.
 
marxian's Avatar
Posts: 2,448 | Thanked: 9,523 times | Joined on Aug 2010 @ Wigan, UK
#3
Originally Posted by ziggadebo View Post
I was wondering if there's anyway to download a whole thread so I can read it offline. I know I can do a page at a time.

Thanks
Do you want to download each page of the thread as an HTML file?

Code:
for ((i = $3; i <= $4; i++))
do
    echo "grabbing page $i"
    wget "$2&page=$i" -O "$1-page$i.html"
done
A python alternative:

Code:
#!/usr/bin/python

import io
import urllib
import sys

def get_thread(fileName, link, start_page, end_page):
    for num in range(int(start_page), int(end_page) + 1):
        print "grabbing page " + str(num)
        page = urllib.urlopen("%s&page=%d" % (link, num)).read()
        with io.open("%s-page%d.html" % (fileName, num) , 'w') as file:
            file.write(unicode(page, 'utf-8'))
            file.close()

if __name__ == '__main__':
    sys.exit(get_thread(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4]))
In both cases you would supply the following arguments:

1. The filename you wish to save to (the page number is appended).

2. The link to the thread.

3. Start page (usually 1).

4. End page (usually the last page of the thread).
__________________
'Men of high position are allowed, by a special act of grace, to accomodate their reasoning to the answer they need. Logic is only required in those of lesser rank.' - J K Galbraith

My website

GitHub

Last edited by marxian; 2011-10-10 at 13:40.
 

The Following 7 Users Say Thank You to marxian For This Useful Post:
Posts: 244 | Thanked: 354 times | Joined on Jul 2010 @ Scotland
#4
Click on Thread Tools, choose Printable Version, then "save as" or print to file (PDF).

Edit:

My mistake, both this and the archive/index.php resort to pagination for large threads.

Last edited by gregoranderson; 2011-10-10 at 13:55.
 
Posts: 99 | Thanked: 36 times | Joined on Mar 2010
#5
Originally Posted by marxian View Post
Do you want to download each page of the thread as an HTML file?

Code:
for ((i = $3; i <= $4; i++))
do
    echo "grabbing page $i"
    wget "$2&page=$i" -O "$1-page$i.html"
done
A python alternative:

Code:
#!/usr/bin/python

import io
import urllib
import sys

def get_thread(fileName, link, start_page, end_page):
    for num in range(int(start_page), int(end_page) + 1):
        print "grabbing page " + str(num)
        page = urllib.urlopen("%s&page=%d" % (link, num)).read()
        with io.open("%s-page%d.html" % (fileName, num) , 'w') as file:
            file.write(unicode(page, 'utf-8'))
            file.close()

if __name__ == '__main__':
    sys.exit(get_thread(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4]))
In both cases you would supply the following arguments:

1. The filename you wish to save to (the page number is appended).

2. The link to the thread.

3. Start page (usually 1).

4. End page (usually the last page of the thread).
Thanks, Am learning python at the moment, early days. So will play with this tonight.

Cheers
 
pelago's Avatar
Posts: 2,121 | Thanked: 1,540 times | Joined on Mar 2008 @ Oxford, UK
#6
Originally Posted by ziggadebo View Post
I was wondering if there's anyway to download a whole thread so I can read it offline. I know I can do a page at a time.

Thanks
If you use Firefox and want a point-and-click solution, look into the Re-Pagination add-on at https://addons.mozilla.org/en-US/fir...re-pagination/. This can turn a multi-page thread into one long page (this is useful if you want to search within a thread, by the way). Unfortunately you cannot then save the long page as HTML, but you could "print" it to a PDF if you have the correct software.
 

The Following 3 Users Say Thank You to pelago For This Useful Post:
Posts: 99 | Thanked: 36 times | Joined on Mar 2010
#7
Ok,

Can someone point me in the right direction, I'm trying to run the python code, the below is where I'm at. I think I've defined the variables correctly?

I get an error 'test' is not defined, how do I correctly assign file name variable?

Code:
#!/usr/bin/python

import io
import urllib
import sys

fileName = test
link = urllib.urlopen("http://talk.maemo.org/showthread.php?t=73315.html")
start_page = 1
end_page = 255

def get_thread(fileName, link, start_page, end_page):
    for num in range(int(start_page), int(end_page) + 1):
        #print "grabbing page " + str(num)
        page = urllib.urlopen("%s&page=%d" % (link, num)).read()
        with io.open("%s-page%d.html" % (fileName, num) , 'w') as file:
            file.write(unicode(page, 'utf-8'))
            file.close()

if __name__ == '__main__':
    sys.exit(get_thread(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4]))


I'm not after the full solution just a friendly pointer in the right direction.

Thanks

Last edited by ziggadebo; 2011-10-10 at 19:22. Reason: changed code
 
marxian's Avatar
Posts: 2,448 | Thanked: 9,523 times | Joined on Aug 2010 @ Wigan, UK
#8
Originally Posted by ziggadebo View Post
Ok,

Can someone point me in the right direction, I'm trying to run the python code, the below is where I'm at. I think I've defined the variables correctly?

I get an error 'test' is not defined, how do I correctly assign file name variable?

I'm not after the full solution just a friendly pointer in the right direction.

Thanks
Your code contains the following errors:

1. The fileName variable should be a string, i.e. "test". Your code attempts to assign the value of an undefined variable test to the variable fileName.

2. The link variable should also be a string. Your code attempts to perform urllib.urlopen() on an object that has been returned by that method (as part of a string).

3. You have added ".html" to the thread link. The link should end after the t parameter, i.e "http://talk.maemo.org/showthread.php?t=73315".

Solution:

Code:
#!/usr/bin/python

import io
import urllib
import sys

file_name = "test"
link = "http://talk.maemo.org/showthread.php?t=73315"
start_page = 1
end_page = 255

def get_thread(file_name, link, start_page, end_page):
    for num in range(int(start_page), int(end_page) + 1):
        #print "grabbing page " + str(num)
        page = urllib.urlopen("%s&page=%d" % (link, num)).read()
        with io.open("%s-page%d.html" % (file_name, num) , 'w') as file:
            file.write(unicode(page, 'utf-8'))
            file.close()

if __name__ == '__main__':
    sys.exit(get_thread(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4]))
In my earlier example, I had a mixture of styles for the variable names, so I changed fileName to file_name.
__________________
'Men of high position are allowed, by a special act of grace, to accomodate their reasoning to the answer they need. Logic is only required in those of lesser rank.' - J K Galbraith

My website

GitHub

Last edited by marxian; 2011-10-11 at 17:34.
 

The Following User Says Thank You to marxian For This Useful Post:
Posts: 99 | Thanked: 36 times | Joined on Mar 2010
#9
Firstly marxian a big thank you for your help you certainly pointed me in a direction to get this working. The below is your code, I just changed it to get it working for me.

I couldn't get your python code to run try as I might (Probably more to do with my complete lack of understanding than your code) However It definitely won't run on the N900 as python2.5 doesn't support the io library.

Anyway I've taken what you've given me and with some help( well a lot of help) I've managed to get it working on the N900.

If anyone wants to use, the process/steps needed are as follows:
(I'm writing this as an absolute beginner so feel free to point out any errors)

I'm assuming that python2.5 has been installed on your N900. If not install it first.

Firstly using a text editor on your N900 (I use leafpad) copy the code below into it and save the file. Call it getathread.py - Save it to your MyDocs folder so that the output will be easily reachable when we need it.

Code:
import urllib
import sys
try:
    sys.argv[3] = int(sys.argv[3])
    sys.argv[4] = int(sys.argv[4])
except ValueError:
    print "Use numbers for the last two arguments or whatever"
    sys.exit(1)

def get_thread(fileName, link, start_page, end_page):
    for num in range(int(start_page), int(end_page) + 1):
        print "grabbing page " + str(num)
        page = urllib.urlopen("%s&page=%d" % (link, num)).read()
        file = open("%s-page%d.html" % (fileName, num) , 'w')
        file.write(page)
        print "downloaded: "+ str(num)
        file.close()





if __name__ == '__main__':
    sys.exit(get_thread(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4]))
now go into your terminal

and change directory to the MyDocs directory

by typing

Code:
cd MyDocs
Now to run the script we need 4 variables (pieces of information)

1. File name you want to save the output as
2. link to the thread
3. First page of thread to download
4. Last page of thread to download

We will use this information to trigger the code.

So for this example, I will use the popular Kernal Power V49 thread

So our variables are
1. power49
2. http://talk.maemo.org/showthread.php?p=1105192
3. 1
4. 186

So to run our code we would type in terminal (note the variables are just separated by a single space)

Code:
python getathread.py power49 http://talk.maemo.org/showthread.php?p=1105192 1 186
you should then get output on the screen saying
grabbing page 1
downloaded: 1
grabbing page 2
downloaded: 2

etc......

When finished simply launch any of the files from filemanager or from your browser.
 

The Following 2 Users Say Thank You to ziggadebo For This Useful Post:
Reply


 
Forum Jump


All times are GMT. The time now is 00:55.