29 October 2014

Hackbright Day 22: Machine Learning

Wow it's after midnight, but I've been incredibly productive, so that's a good thing. It's almost project time and I'm flailing around trying to make sure that I get everything in order for next week. I really want to do a machine learning project. I spent the evening scraping content from one of my favorite blogs that uses really nice css selectors so that I can use a python library called BeautifulSoup to grab the ingredients from recipes and try to do some sort of clustering on them to see what's the what.

And also, shell scripting? Still incredibly useful:

cat list_of_pages_to_scrape | awk '{print("grep "$1" toc.php?sort=date")}' | bash | sed 's/"/ /g' | sed 's/</ /g' | awk '{print($3)}' | awk '{print("curl -O -A, --user-agent PPPPPMozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36PPPPP  http://www.myfavoritefoodblog.com"$1)}'   | sed 's/PPPPP/"/g' | bash

No comments: