Skip to main content

Exercises 14.1 Multiple Choice Questions

1.

    Q-1: Given the below html, how would this tag type be described in web scraping code?
    <h1 class='sports'>Sports News</h1>
    
  • h1
  • Try again! Each tag must be in quotes and this answer does not mention the class attribute.
  • h1, class='sports'
  • Try again! Each tag must be in quotes and the class has to be followed by an underscore.
  • h1, class_='sports'
  • Try again! Each tag must be in quotes.
  • 'h1', class_='sports'
  • Correct! Both the tag and attribute are important. The h1 tag needs to be in quotes, and class has to be followed by an underscore.

2.

    Q-2: Which line of code correctly gets the first item in items and makes the most sense following the below code snippet?
    soup = BeautifulSoup(response.content, 'html.parser')
    items = soup.find_all(class_='items')
    
  • first_item = items[0]
  • Correct! Given that soup.find_all(class_='items') returns a list, in order to get the first item, all you need to do is index.
  • first_item = items.find(0)
  • Try again! Since soup.find_all(class_='items') returns a list, we cannot use find() as it is a string method that returns the first instance of a specified value in a string.
  • first_item = items.get(0)
  • Try again! Since soup.find_all(class_='items') returns a list, we cannot use get() as it is a dictionary method used to return the value of an item with specified key.
  • first_item = items.find[0]
  • Try again! Since soup.find_all(class_='items') returns a list, we cannot use find() as it is a string method that returns the first instance of a specified value in a string.
  • first_item = soup.items[0]
  • Try again! We already called the soup object to get items so all we need to do is index to the first item.

3.

    Q-3: How does one parse the HTML into a BeautifulSoup object given a response object?
  • soup = BeautifulSoup(response.text, 'html.parser')
  • Correct! It is the correct way to parse content in UniCode.
  • soup = BeautifulSoup(response.content, 'html.parser')
  • Correct! It is the correct way to parse content in bytes.
  • soup = BeautifulSoup(response.string, 'html.parser')
  • Try again! .string returns None if there is more than one tag inside of the ``response`` object.

4.

    Q-4: Which of the following is the best way to get the value for the id in the first p tag?
  • soup.p.get('id')
  • Try again! If there is no tag 'id', this line will return an error.
  • soup.p.get('id', None)
  • Correct! This is the correct way to get the first p tag and get the value for the id in the p tag.
  • soup.p[id]
  • Try again! The correct way to find a tag is to use the get method().
  • soup.p['id']
  • Try again! The correct way to find a tag is to use the get method().

5.

    Q-5: How does one get the first header 1 tag after creating a soup object?
  • soup.h1
  • Correct! The header 1 tag is h1, and this is the correct way to get the first header 1 tag after creating a soup object.
  • soup.header1
  • Try again! There is no tag called header1.
  • soup.h1[0]
  • Try again! h1 is a tag and index 0 will not give the correct output.
  • soup.h1[1]
  • Try again! h1 is a tag and index 1 will not give the correct output.

6.

    Q-6: Which of the following gets the first link tag and returns a dictionary of all attributes and values for that link tag?
  • soup.a.attributes
  • Try again! Attributes is not the correct way to get a dictionary of all attributes and values for a tag.
  • soup.link.attrs
  • Try again! There is no tag 'link', instead we use tag 'a' to find links.
  • soup.a.attrs
  • Correct! This is the correct way to get the first link tag (soup.a) and get a dictionary of all attributes and values for that link tag (.attrs).
  • soup.link.attributes
  • Try again! There is no tag 'link', instead we use tag 'a' to find links. Attributes is not the correct way to get a dictionary of all attributes and values for a tag.

7.

    Q-7: Which of the following finds all link tags?
  • all_links = soup.find('a')
  • Try again! This will only find the first link.
  • all_links = soup.findall('a')
  • Try again! For Beautiful Soup, find_all requires an underscore.
  • all_links = soup.findall('link')
  • Try again! For Beautiful Soup, find_all requires an underscore. There is no tag called 'link'.
  • all_links = soup.find_all('a')
  • Correct! This is the correct way to find all link tags. In HTML, link tags are 'a' tags. For Beautiful Soup, find_all requires an underscore.
  • all_links = soup.find_all('link')
  • Try again! There is no tag called 'link', instead we use tag 'a' to get links.

8.

    Q-8: Which of the following finds all paragraph tags with class b-soup?
  • all_links = soup.find_all('p', class='b-soup')
  • Try again! To find a class in Beautiful Soup, it requires an underscore (class_).
  • all_links = soup.find_all('paragraph', class='b-soup')
  • Try again! There is no tag called 'paragraph', instead we use tag 'p' to find paragraphs. Also, to find a class in Beautiful Soup, it requires an underscore (class_).
  • all_links = soup.find_all('p', class_='b-soup')
  • Correct! This is the correct way to find all paragraph tags. In HTML, paragraph tags are 'p' tags. For Beautiful Soup, to find a class, class requires an underscore (class_).
  • all_links = soup.find_all('paragraph', class_='b-soup')
  • Try again! There is no tag called 'paragraph', instead we use tag 'p' to find paragraphs.

9.

    Q-9: After creating an empty dictionary and getting a list of all link tags, how does one put the link_tag text as keys and the link_tag href attribute as values for the dictionary?
  • loop through the elements of the list and do dictionary[link_tag.text] = a.get('href', None)
  • Try again! Although the 'a' tag is the link tag, the variable that contains the href attribute is link_tag.
  • loop through the elements of the list and do dictionary[link_tag.text] = a['href']
  • Try again! Although the 'a' tag is the link tag, the variable that contains the href attribute is link_tag. Also, using the format tag['attribute_name'] will cause an error if the tag is not there.
  • loop through the elements of the list and do dictionary[link_tag.text] = link_tag.get('href', None)
  • Correct! This is the correct way to create a dictionary with link_tag text as keys and href as values. Using .get('attribute_name', None) will not cause an error. It will set None as the default value and grab the value if there is one.
  • loop through the elements of the list and do dictionary[link_tag.text] = link_tag[href]
  • Try again! The attribute name is missing quotation marks, and using the format tag['attribute_name'] will cause an error if the tag is not there.

10.

    Q-10: Given the below html, after importing re, what will be returned after for tag in soup.find_all(re.compile("t")): print(tag.name) is run?
    <html>
       <head>
          <title>Site</title>
       </head>
       <body>
          <p>There is lots of content.</p>
       </body>
    </html>
    
  • html
  • Correct! It returns html as it is the name of a tag that contains the letter 't'.
  • title
  • Correct! It returns title as it is the name of a tag that contains the letter 't'.
  • Site
  • Try again! 'Site' is not a tag.
  • There is lots of content.
  • Try again! This isn't a tag. This is the content inside a 'p' tag.

11.

    Q-11: What does the following block of code do?
    url = "https://www.nytimes.com"
    html = urllib.request.urlopen(url, context=ctx).read()
    soup = BeautifulSoup(html, 'html.parser')
    
  • retrieves and displays the webpage
  • Try Again! This does not display the webpage. BeautufulSoup parses webpage retrieved by urllib.rquest.
  • parses the html content of the "https://www.nytimes.com" webpage.
  • Correct! This parses all html tags and contents of the webpage.
  • downloads the webpage
  • Try Again! This does not save files to the computer.

12.

    Q-12: What does the following block of code print?
    url = "https://www.nytimes.com/"
    html = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(html, 'html.parser')
    
    tags = soup('img')
    for tag in tags:
        print(tag.get('src', None))
    
  • retrieves and displays the webpage
  • Try Again! Urllib retrieves the webpage but does not display it.
  • downloads the webpage
  • Try Again! This does not save files to the computer.
  • prints the images from 'www.nytimes.com'
  • Try Again! BeautifulSoup and html.parser cannot display images
  • prints all the 'img' sources under 'src' from 'www.nytimes.com'
  • Correct! It prints out the image sources listed under 'src' of 'img' tags.