Skip to main content

Exercises 13.16 Multiple Choice Questions

1.

    Q-1: What protocol can be used to retrieve web pages using python?
  • urllib
  • urlib is a python library that contains several modules with URLs
  • bs4
  • bs4 is a python library pulling out data from HTML files.
  • HTTP
  • HTTP is a network protocol that is used to transmit different documents like HTML.
  • GET
  • GET is a HTTP request method from a specified resource in a server.

2.

    Q-2: What provides two way communication between two different programs in a network.
  • socket
  • A single socket is a program that can be used to send and receive data in a network.
  • port
  • A port represents an endpoint on a computer that can connect to different network nodes.
  • http
  • HTTP is a protocol used for transfer data from a web server.
  • protocol
  • protocol is a set of rules that determine how data is transmitted over a network.

3.

    Q-3: What is a python library that can be used to send and receive data over HTTP?
  • http
  • http is a protocol and not a python library
  • urllib
  • urllib can be used to send and receive data over HTTP instead of manually doing it using a webbrowser.
  • port
  • port is an endpoint for a device to connect with other devices in a network to transmit similar types of data.
  • header
  • a header is additional information sent and received along with data.

4.

    Q-4: What is the process by which search engines retrieve webpages and build a search index called?
  • scrape
  • Scrape is the act of extraction of webpages
  • parse
  • Parse is breaking down scraped webpages to useful data
  • BeautifulSoup
  • BeautifulSoup is a python library for extracting HTML documents
  • spider
  • spider retrieves a webpage and then all the webpages linked to it to form a search index.

5.

    Q-5: What does the following block of code do?
    mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    mysock.connect(('data.pr4e.org', 80))
    cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n'.encode()
    mysock.send(cmd)
    
  • It sends a request to extract 'romeo.txt' from 'data.pr4e.org'
  • this sends a GET request to the webserver over port 80
  • It sends the 'romeo.txt' file to 'data.pr4e.org'
  • This does not send a file to the webserver.
  • It creates a file named 'romeo.txt'
  • This does not create a file
  • It throws an error because a socket cannot use HTTP.
  • sockets can be used to connect with different types of servers using different protocols.

6.

    Q-6: What does the following block of code do?
    import urllib.request
    
    fhand = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')
    for line in fhand:
        print(line.decode().strip())
    
  • It creates a file named 'romeo.txt' in 'data.pr4e.org'
  • urllib.request cannot create files in a web server.
  • It finds the urls linked to 'data.pr4e.org' and prints it.
  • urllib.request is not a spider.
  • It opens a file named 'http://data.pr4e.org/romeo.txt' in local storage
  • urllib.request does not handle files in local storage
  • It prints the contents of 'romeo.txt' after retrieving it from 'data.pr4e.org'
  • urllib.request requests the file and then accepts it.

7.

    Q-7: What does the following block of code do?
    import urllib.request, urllib.parse, urllib.error
    
    img = urllib.request.urlopen('http://data.pr4e.org/cover3.jpg').read()
    fhand = open('cover3.jpg', 'wb')
    fhand.write(img)
    fhand.close()
    
  • It retrieves 'cover3.jpg' and saves it to your computer.
  • Running the code does not display any output because it saves the file to your computer.
  • It displays the image 'cover3.jpg'.
  • It does not output anything on the screen.
  • It retrieves the url to download 'cover3.jpg'
  • The urllib retrieves the file and parses it.

8.

    Q-8: What does the following regex match?
    http[s]?://.+?
    
  • Exact match to 'http[s]?://.+?'
  • The regex uses wildcard characters and is not an exact match case.
  • 'http://' or 'http[s]://' followed by one or more character
  • the square brackets denotes a character class with 0 or 1 's'.
  • 'http://' or 'https://' followed by one or more characters.
  • the '[s]?' means 0 or 1 s and '.+?' means 1 or more characters
  • 'https://' followed by one or more characters.
  • the regex also accepts 'http://' because '[s]?' means 'http' followed by 0 or 1 's'

9.

    Q-9: What does the following block of code do?
    url = "https://www.nytimes.com"
    html = urllib.request.urlopen(url, context=ctx).read()
    soup = BeautifulSoup(html, 'html.parser')
    
  • retrieves and displays the webpage
  • This does not display the webpage. BeautufulSoup parses webpage retrieved by urllib.rquest
  • parses the html content of the "https://www.nytimes.com" webpage.
  • This parses all html tags and contents of the webpage.
  • downloads the webpage
  • This does not save files to the computer

10.

    Q-10: What does the following block of code print?
    url = "https://www.nytimes.com/"
    html = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(html, 'html.parser')
    
    tags = soup('img')
    for tag in tags:
        print(tag.get('src', None))
    
  • retrieves and displays the webpage
  • urllib retrieves the webpage but does not display it
  • downloads the webpage
  • this does not save files to the computer
  • prints the images from 'www.nytimes.com'
  • BeautifulSoup and html.parser cannot display images
  • prints all the 'img' sources under 'src' from 'www.nytimes.com'
  • it prints out the image sources listed under 'src' of 'img' tags.