Skip to main content

Section 13.18 Write Code Exercises

Checkpoint 13.18.1.

Complete the following code that retrieves the file ‘romeo.txt' from. Make changes to line 4 and 5.
import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('____', __))
cmd = '____ ________________________ HTTP/1.0\r\n\r\n'.encode()
mysock.send(cmd)

while True:
    data = mysock.recv(512)
    if len(data) < 1:
        break
    print(data.decode(),end='')

mysock.close()
Solution.
Complete the following code that retrieves the file ‘romeo.txt' from. Make changes to line 4 and 5.
import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n'.encode()
mysock.send(cmd)

while True:
    data = mysock.recv(512)
    if len(data) < 1:
        break
    print(data.decode(),end='')

    mysock.close()

Checkpoint 13.18.2.

Complete the following code to extract an image ‘cover3.jpg' from the URL ‘http://data.pr4e.org/cover3.jpg 1 ' and host ‘data.pr4e.org'. There are 5 empty spaces.
import socket

HOST = '________'
PORT = 80
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((____, ____))
mysock.sendall(b'___ __________________ HTTP/1.0\r\n\r\n')
count = 0
picture = b""

while True:
    data = mysock.recv(5120)
    if len(data) < 1: break
    #time.sleep(0.25)
    count = count + len(data)
    print(len(data), count)
    picture = picture + data

mysock.close()

pos = picture.find(b"\r\n\r\n")
picture = picture[pos+4:]
fhand = open("stuff.jpg", "wb")
fhand.write(picture)
fhand.close()

Checkpoint 13.18.3.

Complete the following code that retrieves the text from ‘http://data.pr4e.org/clown.txt 2 ', prints it and also prints the frequency of each word.
import urllib.request

fhand = urllib.request.urlopen('_________________')
Solution.
Complete the following code that retrieves the text from ‘http://data.pr4e.org/clown.txt' and prints the frequency of each word.
import urllib.request

fhand = urllib.request.urlopen('http://data.pr4e.org/clown.txt')
for line in fhand:
    words = line.decode().strip()
    print(words)
    for word in words:
        counts[word] = counts.get(word, 0) + 1
print(counts)

Checkpoint 13.18.5.

Write a program to store image file from ‘http://data.pr4e.org/cover.jpg 4 ' to your disk.
import urllib.request, urllib.parse, urllib.error
Solution.
Write a program to store image file from ‘http://data.pr4e.org/cover.jpg' to your disk.
import urllib.request, urllib.parse, urllib.error

img = urllib.request.urlopen('http://data.pr4e.org/cover.jpg').read()
fhand = open('cover.jpg', 'wb')
fhand.write(img)
fhand.close()

Checkpoint 13.18.6.

Complete the following program to extract all url from the webpage using regex.
import urllib.request, urllib.parse, urllib.error
import re

url = "https://www.nytimes.com"
html = _______________________
links = _______(b'href="(http[s]?://.*?)"', html)

Checkpoint 13.18.7.

Write a program that retrives a txt file from ‘https://www.gutenberg.org/files/1342/1342-0.txt 5 ' in several blocks of 100,000 characters, joins them and saves as ‘prideandprejudice.txt' to disk and prints number of characters.
import urllib.request, urllib.parse, urllib.error

txt = urllib.request.urlopen('___________________')

size = 0
while True:
    info = txt.read(100000)
    if len(info) < 1: break
    size = size + len(info)
Solution.
Write a program that retrives a txt file from ‘https://www.gutenberg.org/files/1342/1342-0.txt' in several blocks of 100,000 characters, joins them and saves as ‘prideandprejudice.txt' to disk and prints number of characters.
import urllib.request, urllib.parse, urllib.error

txt = urllib.request.urlopen('https://www.gutenberg.org/files/1342/1342-0.txt')
fhand = open('prideandprejudice.txt', 'wb')
size = 0
while True:
    info = txt.read(100000)
    if len(info) < 1: break
    size = size + len(info)
    fhand.write(info)

print(size, 'characters copied.')
fhand.close()

Checkpoint 13.18.8.

Write a program that retrives a txt file from ‘https://www.gutenberg.org/files/16/16-0.txt 6 ' in several blocks of 100,000 characters, joins them and saves as ‘peterpan.txt' to disk and prints number of characters.
import urllib.request, urllib.parse, urllib.error

Checkpoint 13.18.9.

Complete the following code to print all the image sources from the webpage. Use ‘img' and ‘src' as tags.
import requests
from bs4 import BeautifulSoup

url = "https://www.nytimes.com/"
resp = requests.get(url)
soup = BeautifulSoup(resp.content, 'html.parser')
Solution.
Complete the following code to print all the image sources from the webpage. Use ‘img' and ‘src' as tags.
import requests
from bs4 import BeautifulSoup

url = "https://www.nytimes.com/"
resp = requests.get(url)
soup = BeautifulSoup(resp.content, 'html.parser')

tags = soup('img')
for tag in tags:
    print(tag.get('src', None))

Checkpoint 13.18.10.

Write code that extracts data from several parts of the ‘a' tag from “http://www.dr-chuck.com/page1.htm 7 ” using BeautifulSoup and html.parser and print the tag, href, contents as well as all the attributes.
import requests
from bs4 import BeautifulSoup
http://data.pr4e.org/cover3.jpg
http://data.pr4e.org/clown.txt
http://data.pr4e.org/intro-short.txt
http://data.pr4e.org/cover.jpg
https://www.gutenberg.org/files/1342/1342-0.txt
https://www.gutenberg.org/files/16/16-0.txt
http://www.dr-chuck.com/page1.htm