![webscraper python lyrics webscraper python lyrics](https://miro.medium.com/max/1200/1*O5ngnJfi1EEi2GmX4-N53A.jpeg)
Now we have all the pieces we need to scrape the lyrics to every Katy Perry song with the below for loop. Song_links <- html_attr(song_nodes, name='href') # "Hackensack (Fountains Of Wayne Cover)"Īnd the url to the song’s webpage. Now we have the nodes for each song we want to extract the song title # grab the song titles Notice the pattern of the artist_url: where ARTIST is the artist’s name in lower case with spaces replaced by. tracklist a is CSS tag for all of the song names. Using SelectorGaget again we find #colone-container. The html_nodes function grabs the node corresponding to the #songLyricsDiv tag html_text extracts the text from this node. Lyrics # "This was never the way I planned\nNot my intention\nI got so brave\nDrink in hand\nLost my discretion\nIt's not what I'm used to\nJust wanna try you on\nI'm curious for you\nCaught my attention\nI kissed a girl\nAnd I liked it\nThe taste of her cherry chap stick\nI kissed a girl just to try it\nI hope my boyfriend don't mind it\nIt felt so wrong\nIt felt so right\nDon't mean I'm in love tonight\nI kissed a girl\nAnd I liked it\nI liked it\nNo, I don't even know your name\nIt doesn't matter\nYou're my experimental game\nJust human nature\nIt's not what\nGood girls do\nNot how they should behave\nMy head gets so confused\nHard to obey\nI kissed a girl\nAnd I liked it\nThe taste of her cherry chap stick\nI kissed a girl\nJust to try it\nI hope my boyfriend don't mind it\nIt felt so wrong\nIt felt so right\nDon't mean I'm in love tonight\nI kissed a girl\nAnd I liked it\nI liked it\nUs girls we are so magical\nSoft skin\nRed lips\nSo kissable\nHard to resist\nSo touchable\nToo good to deny it\nAin't no big deal\nIt's innocent\nI kissed a girl\nAnd I liked it\nThe taste of her cherry chap stick\nI kissed a girl just to try it\nI hope my boyfriend don't mind it\nIt felt so wrong\nIt felt so right\nDon't mean I'm in love tonight\nI kissed a girl\nAnd I liked it\nI liked it" This is similar to the BeautifulSoup package in Python. Thanks to the open source community you don’t have to. Once you have the html text you could excract the information you want using regular expressions. For example,ĭisplays a link to with text I Kissed A Girl.Īn HTML document is just a text file that follows specific patterns. The head contains metadata about the webpage and the body contains the contents. The first division in the hierarchy is the head and the body. It’s worth learning more so you can a) do more advanced scraping b) create your own websites.Īn HTML document is make up of a hierachy of tags. With rvest and SelectorGaget we don’t need to know too much to do many basic tasks.
#Webscraper python lyrics code
To learn more about HTML check out Code Academy’s tutorial.įor our purposes we only need to understand enough HTML to access its contents (Iain knows more Katy Perry lyrics than HTML code…).