Blog Author Classification (Test)
Contents
Blog Author Classification (Test)#
We want to write a script which takes a list of URLs to blog posts and yields predictions for gender, age and industry of the blog author. For this purpose we have to load our trained models from project Blog Author Classification (Training) and we have to apply all the necessary preprocessing steps to the downloaded posts.
Getting some Blog Posts#
Task: Collect URLs of posts of some blog in a list. Take a blog for which you know gender, age and industry of the author. So we will see whether our models yield good predictions.
Solution:
# your solution
Task: Download all webpages in the list. Strip HTML tags with Beautiful Soup’s get_text
and join all posts to one string.
Solution:
# your solution
Preprocessing#
Task: Repeat all preprocessing steps from part 1 of the text processing chapter (remove punctuation, tokenize, lemmatize).
Solution:
# your solution
Task: Load the three label lists and the vectorizer.
Solution:
# your solution
Task: Vectorize the lemmatized text.
Solution:
# your solution
Prediction#
Task: Load the three saved SVC models.
Solution:
# your solution
Task: Predict the blog author’s gender, age and industry. Provide the result in human readable form.
Solution:
# your solution