Note 13.13.1.
If you work in a group, have only one member of the group fill in the answers on this page. You will be able to share your answers with the group at the bottom of the page.
find
and find_all
to get data from a soup objectclass_
to find data with a particular CSS classtag.text
tag.get(attribute)
find
or find_all
to find either the first of a type of a tag or a list of a type of tag.requests
library to get a response object from a URL, create a BeautifulSoup
object from the HTML content in the response, use find
to find the first paragraph tag, and then print the first paragraph tag.import requests
from bs4 import BeautifulSoup
# get the response from the URL
url = "https://www.michigandaily.com/"
resp = requests.get(url)
# create the soup object
soup = BeautifulSoup(resp.content, 'html.parser')
# Print the first paragraph tag in the soup
print(soup.find('p'))
html.parser
is the HTML parser that is included in the standard Python 3 library. It is used to parse the HTML into a tree of tags. Information on other HTML parsers is available at:find_all
method on BeautifulSoup to get a list of all of the paragraphs.tagName.text
to get the text. You can also find a tag with a particular CSS class.import requests
from bs4 import BeautifulSoup
# create the soup object from the HTML
url = "https://www.michigandaily.com/"
resp = requests.get(url)
soup = BeautifulSoup(resp.content, 'html.parser')
# get the headline with the class and print its text
tag = soup.find("p", class_="site-description")
print(tag.text)
class_
as the keyword. This is becuase class
is already a keyword that is used to define a new class in Python.<a href="url">link text</a>
. An example is <a href="https://www.w3schools.com">W3Schools</a>
.import requests
from bs4 import BeautifulSoup
# get the soup object
url = "https://nytimes.com"
resp = requests.get(url)
soup = BeautifulSoup(resp.content, 'html.parser')
# print the href in each link (anchor) tag
tags = soup.find_all('a')
for tag in tags:
print(tag.get('href', None))
import requests
from bs4 import BeautifulSoup
# get the soup object
url = "https://www.michigandaily.com/"
resp = requests.get(url)
soup = BeautifulSoup(resp.content, 'html.parser')
# get all the li tags and find the first link (a) tag and print the href
li_list = soup.find_all("li", class_="menu-item-has-children")
for li in li_list:
a_tag = li.find('a')
print(a_tag.get("href",None))
import requests
from bs4 import BeautifulSoup
# get the soup object
url = "https://www.michigandaily.com/"
resp = requests.get(url)
soup = BeautifulSoup(resp.content, 'html.parser')
# get the div tag with a class of "jetpack_top_posts_widget"
# get all the li tags in the div tag
# loop through the li tags
# get the 'a' tag and print the value of the "href" attribute
https://cspogil.org/Home
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser