FindProxy

Overview

FindProxy examines the contents of files, or web pages, extracts (and optionally tests) likely looking proxy entries.
By default, a single test is applied to each proxy found - whether it can GET a reference web page intact. Originally designed to detect censoring web proxies, the program uses a guaranteed censored web page as this reference page by default. When this page is user-specified (via '-r') and given the value 'none', the proxies are extracted in the standard address:port format, but no tests are done. This will soon become the default mode of operation. StatProxy (another tool in the proxyTools distribution) has equivalent functionality now (tests 0 and 14).
By default, any proxy strings found which have ports which are known to be blocked by firewall, are ignored. The firewall may be specified using the '-F' option, or automatically determined from the test location IP address. User specified lists of ports to ignore may also be given.
By default, tests are carried out from the user's computer. The user may specify the tests to be carried out via any accessible CONNECT proxy, and by choosing such a proxy carefully, the user may conceal his real IP address.
By default, any initial proxy list web page is obtained by use of the local ISP's proxy. If that blocks access to the list site, a CONNECT proxy may be specified for this purpose too.

FindProxy uses Perl regular expression matching tro detect the likely proxy strings within the page/file content, allowing it to match many kinds of content structure, while still allowing easy maintenance for new page formats. Several dozens of different web pages, proxy list file formats, mailing lists, bulletin boards etc. are currently interpreted correctly.

Unique amongst the proxyTools, this program requires a URL format for the specification of the location of the target content. The URL may specify a local file (file://, or file:) or a web location (http://, news:, etc.). Examples are given below.

Like most of the proxyTools, a configuration file can be used so the user need not repeatedly type command line options.

Installation and operation of findProxy


I'm assuming you're using MS Windows with ActiveState Perl; users of other operating systems will
probably have no trouble following this anyway. I'm also assuming the configuration file is unmodified.

Unzip the proxyTools.zip package.

There are two ways to run this thing:
a) from the command line (a DOS shell) like:
perl findProxy.pl <options> <url>
or
b) from a shortcut/link. In this case, you'll probably want to edit
the options and url parameter into the script itself. Not recommended.

Note: if you add .pl to the PATHEXT environment variable under MS
Windows, and you have the standard '.pl' extension associated with
perl.exe, you won't need to type the 'perl ' part or the .pl
in the command line. So you can just type:
findProxy <options> <url>

Examples:

The <url> below may be a file in the current directory, like:
file:listOfProxies.txt
or it may a web page, like:
http://www.angelfire.com/my/6waynes/checkedPublicProxies.html
or even ftp etc.

For clarification, the -p proxy is used to get the list in the
first place, so it's unnecessary if the url is a local 'file:'.
But this proxy is (by default) also used for the CONNECT connections,
so it's needed whenever the -C option is used (and, of course, must
be a CONNECT capable proxy). There is now an option to specifiy a
different proxy for this purpose.

Also note that any missing command line options will be defaulted
to some (usually) sensible value in the code - you can set these
once and for all by editing the configuration file.

Examples

perl findProxy.pl -h
will give a quick list of the options (and their defaults) available

Even more details are found by using the builtin documentation:
perldoc findProxy.pl
______________________

perl findProxy.pl <url>
will test proxies at <url> for those which are directly usable from
the UAE (by configuring into your web browser). A UAE proxy is used,
if necessary, using the normal GET to test them. Output is noisy,
timeout is 60 secs. Note the code defaults to a UAE proxy, so users outside the UAE must at
least add a -p <proxy> they have access to (for non-file: URLs).
______________________

perl findProxy.pl -F UAE-dialup <url>
As above. -F options are those available in the firewalls.xml database.
______________________

perl findProxy.pl -F UAE-dialup -p http://194.170.168.236:8080/ <url>
As above (this is the proxy it's defaulting to anyway).
KSA people should use -F KSA-ISU *and* a -p <proxy> they can access.
______________________

perl findProxy.pl -F firewall-none -C -p http://194.170.168.236:8080/ <url>
will check a list of proxies from <url>, using a
CONNECT via the proxy at 194.170.168.236:8080 (a UAE proxy which
allows CONNECT). No proxies are excluded from the check. Oh ...
'bess' proxies are still excluded :-)
______________________

perl findProxy.pl -F UAE-dialup] <url>
will check a list at <url> ignoring all proxies on blocked ports
for all UAE-dialup user.
______________________

perl findProxy.pl -i [portList] <url>
will check the list at <url> ignoring *only* the ports listed in
portList (as many as you like, separated by commas). If there is only one port in <portList>, the square brackets may be omitted.
This means the program can be used from any country/corporation
once the list of blocked ports is known.
______________________

perl findProxy.pl -r none http://www.angelfire.com/my/6waynes/checkedPublicProxies.html
will extract the proxies found in the page specified and list them in standard format (saving to a file by default). No tests will be done.

Let me know about any bugs. Feel free to hack the code around (and send me any useful
patches).

Have fun
wayne@nym.alias.net