最後更新: 2020-12-11
python3
from urllib.error import HTTPError import urllib.request url = 'https://datahunter.org' try: response = urllib.request.urlopen(url) response_status = response.status # 200, 301, etc mtime = response.getheader('Last-Modified') response_content = response.read() except HTTPError as error: response_status = error.code # 404, 500, etc
OLD
urllib2.urlopen(url[, data][, timeout])
!! HTTPS requests do not do any verification of the server’s certificate.
* HTTP/1.1
* Connection:close header included
data
a string specifying additional data to send to the server, Default: None
method:
geturl() # return the URL of the resource retrieved
http://datahunter.org/availability.json
info() # return the meta-information of the page
Date: Sun, 21 Sep 2014 01:50:31 GMT
Server: Apache/2.2.16 (Debian)
Last-Modified: Sat, 20 Sep 2014 18:30:35 GMT
ETag: "6f394b-5d8-5038369de596e"
Accept-Ranges: bytes
Content-Length: 1496
Connection: close
Content-Type: text/plain
X-Pad: avoid browser bug
拿出其中一個 header
r.info().getheader('Last-Modified')
getcode() # return the HTTP status code of the response.
200
Exceptions
- URLError
- HTTPError
Request Objects
class urllib2.Request(url[, data][, headers][, origin_req_host][, unverifiable])
Request.add_data(data)
Request.get_data()
Request.has_data()
header
- Request.has_header(header)
- Request.header_items()
Adding HTTP headers:
# Use the headers argument to the Request constructor, or:
import urllib2 req = urllib2.Request('http://www.example.com/') req.add_header('Referer', 'http://www.python.org/') r = urllib2.urlopen(req)
URLError, urlopen
Usage
urllib2.urlopen(url[, data][, timeout])
from urllib2 import URLError, urlopen mylink = "http://datahunter.org/availability.json" try: response = urlopen(mylink, timeout=4) except URLError, e: print "Link Error: " + str(e) response.close() json = response.read() print json
* 要注意 timeout 的設定, Default: ?
ssl.SSLError: The read operation timed out