	                   GETLEFT

Introduction

Once upon a time I tried to download one month worth of a mailing
list archive. I tried to do it with Getright, but
that program can only process up to 500 links in a web page,
this archive had well over one thousand. At the same time I
learned that Tcl could download files, so I thought "How hard 
can it be to make my own program?".

So here is my little effort, or at least a very early alpha 
version of it. It is supposed to download a given Web site.
You give it an URL, and down it goes on, happily downloading every
linked URL in that site.

While it goes, it changes the original pages, all the links
get changed to relative links. so that you can surf the site in
your hard disk without those pesky absolute links.

This is an alpha release, that means it may behave in an erratic
way or fail to work at all. It kind of works for me, but be advised it
may not for you.


Author

Loath as I am to admit it, this is the brainchild of
Andres Garcia, you can contact me:

ornalux@redestb.es

You may send all kinds of bug reports, URLs that
don't work, patches, or even lots of money.


Requirements

It works on Linux, I guess Unix variants will be fine, and Windows.
It looks and works better on Linux though.

The program requires Tcl/Tk 8.1. It will, most definitely, not 
work with earlier versions due to the regular expressions used.

In Linux you can get the version you have got by doing:

wish
% puts $tcl_version

In most cases it will return something like: 8.0, as the 8.1
version is quite recent, it has not yet been included
in many distributions.

In windows, if you do not know what version you have, you probably
do not have any.

If you don't know what I am talking about, check 
www.scriptics.com, the company that makes the Tcl/Tk interpreter
(don't panic, it's free).

To do the actual download the Getleft uses, the program 'curl'
you can get it in it's author, Daniel Stenberg, web page:

http://www.fts.frontec.se/~dast/curl/

And don't you worry, it's also free (Am I cheap or what?)

In Unix/Linux world that's it, for Windows you need the
executables found in the win.exe self-extracting archive.
Put these file together with curl for Windows in the same
directory where 'Getleft.tcl' is

These files for Windows come from the Cygwin project by Cygnus:

http://sourceware.cygnus.com/cygwin/

Menus

The following is a short description of what the options in the
menus do, or at least what they are meant to do.


File menu


Enter URL

A small dialog box appears, whatever is in the clipboard will
appear in the entry box.

So if you are surfing the Internet and see something you like,
you copy the URL you are in, fire Getleft up and you will see
the URL ready for downloading.

Opening the combobox, you have the last urls you entered available.

If you give the url for a directory or site, be sure to include '/'
at the end, for example, do not write:

www.foo.com

but:

www.foo.com/

After you give the URL to download, you will be asked where to store
the bounty, you have to give an existing directory or where to create 
a new one.

Then the first file is downloaded and processed, after that the program
shows a dialog box with all the links in that first page, you get to choose
which ones to follow and which will be ignored.

Then the real downloading will begin, link after link in quite a boring fashion.
The dialog which shows the downloading of a file shows a button where you can
pause and resume the download, it doesn't work that well, the program doesn't
check whether the server actually allows it, and, even when it is supposed to do
it, the result may not be the expected one.

If you minimize the downloading window, the following windows will also appear
minimized, until you maxmize one of them

There is an error log, a file called 'geterror.log', it is not very useful
yet, in case you are conecting through a proxy it is useless. Hopefully it will
be. If any error is detected during the download, a window will appear at the
end of the downloading to show you this error log.

Site Map

Use this command to get a map from the Web site. To begin with, Getleft will
download all the html files and active content pages in the site and then
present you with a dialog box in which you can choose which files will be
downloaded.

In theory, this could run out of memory, let me know if it also can happen in
practice.

Stop

After this page: stops after downloading all links in the current page
After this file: stops after downloading the current file.

Exit

Well, it exits the program.


Options menu


Up links

Choose whether you want to follow, default, or ignore the links to pages
that are above the current one in the site directory structure.


Levels

The levels of links you want the program to follow.

The default is 'no limit', '0' will download a page but no links, not even
the graphics, etc.

Filter Files

Only Html

if you check this option, only html and active content pages will be downloaded

Choose filter

A dialog box appears in which you can choose which types of files you do not
want to download

CGI

If you want to follow links that go through CGI scripts, check this option.
The program is not that intelligent, it only identifies CGI scripts if the
link includes a '?', to pass parameters.


Use proxy

Check this option if you need to use a proxy to access the Internet.


Tools menu


Purge files

This option allows to recursively scan through a directory tree, deleting 
files that match a certain pattern and substituting them with empty files with
the same name.

This is useful is a site takes more than a session to download. For example,
imagine you use two computers, one with a fast, reliable or at least, cheap
Internet connection, like the computer at work or at college, and another one,
your very own, without it. You can download the sites in the first one and 
take them to the second, in floppies, Zips,... With this option you can free
space in the first computer and Getleft, seeing that the file already exists,
won't try to download it again.


Restore orig

As the files get changed to keep only relative paths, the original ones are
kept with an '.orig' extension. This command deletes the new files and
renames the original ones to their real name.

This is useful, is you actually want a mirror a site. This program, though, is
not a good tool for mirroring sites.



Configure proxy

A dialog box appears in which you can enter the address of your proxy. You
can check the address in the configuration of your browser, failing that, you
can ask your network administrator.


Languages

You get to choose with language the program will use, at present only English
and Spanish are supported



Help Menu


Manual

Shows this text


License

Shows the GNU license, basically you can do whatever you want with the program
except claim that you wrote it yourself and change the license.

It should also be very clear that the program comes with NO WARRANTY
whatsoever. In fact, I would be very surprised if it happens to work at all.


About

Shows some info about the program.

