class - urllib2

最後更新: 2020-12-11

 

 


python3

 

from urllib.error import HTTPError
import urllib.request

url = 'https://datahunter.org'

try:
    response = urllib.request.urlopen(url)
    response_status = response.status  # 200, 301, etc
    mtime = response.getheader('Last-Modified')
    response_content = response.read()
except HTTPError as error:
    response_status = error.code  # 404, 500, etc
   

 


OLD

urllib2.urlopen(url[, data][, timeout])

 

 !! HTTPS requests do not do any verification of the server’s certificate.

* HTTP/1.1
* Connection:close header included

data

a string specifying additional data to send to the server, Default: None

method:

geturl()              # return the URL of the resource retrieved

http://datahunter.org/availability.json

info()                 # return the meta-information of the page

Date: Sun, 21 Sep 2014 01:50:31 GMT
Server: Apache/2.2.16 (Debian)
Last-Modified: Sat, 20 Sep 2014 18:30:35 GMT
ETag: "6f394b-5d8-5038369de596e"
Accept-Ranges: bytes
Content-Length: 1496
Connection: close
Content-Type: text/plain
X-Pad: avoid browser bug

拿出其中一個 header

r.info().getheader('Last-Modified')

getcode()           # return the HTTP status code of the response.

200

Exceptions

  • URLError
  • HTTPError

 

Request Objects

class urllib2.Request(url[, data][, headers][, origin_req_host][, unverifiable])

Request.add_data(data)
Request.get_data()
Request.has_data()

header

  • Request.has_header(header)
  • Request.header_items()

Adding HTTP headers:

# Use the headers argument to the Request constructor, or:

import urllib2
req = urllib2.Request('http://www.example.com/')
req.add_header('Referer', 'http://www.python.org/')
r = urllib2.urlopen(req)

 


URLError, urlopen

 

Usage

urllib2.urlopen(url[, data][, timeout])

from urllib2 import URLError, urlopen

mylink = "http://datahunter.org/availability.json"

try:
    response = urlopen(mylink, timeout=4)
except URLError, e:
    print "Link Error: " + str(e)
    response.close()
    
json = response.read()
print json

* 要注意 timeout 的設定, Default: ?

ssl.SSLError: The read operation timed out

 

 

 

 

 

 

Creative Commons license icon Creative Commons license icon