Saturday, June 20, 2015

Scraping List of all Mangas with Link in Python

Snapshot of a text file saved under the comma-...
Snapshot of a text file saved under the comma-separated value format (Photo credit: Wikipedia)
I wrote a little script that scrapes all the Manga names and their respective homepage URLs as a CSV file.

Here's the code :-

from lxml.html import parse

__author__ = 'Psycho_Coder'


def main():
    with open("mangalist.csv", "w") as f:
        tree = parse("http://www.mangapanda.com/alphabetical")
        manga_name_list = tree.xpath("//ul[@class='series_alpha']/li/a/text()")
        manga_url_list = tree.xpath("//ul[@class='series_alpha']/li/a/@href")
        f.write("\"Manga Name\", URL\n")

        for i in range(len(manga_name_list)):
            f.write("\"{0}\", http://www.mangapanda.com{1}\n".format(manga_name_list[i].replace("\"", ""), manga_url_list[i]))

if __name__ == "__main__":
    main() 

CSV : https://github.com/AnimeshShaw/MangaScrapper/blob/master/resources/mangalist.csv
Code On Github https://github.com/AnimeshShaw/MangaScrapper/blob/master/resources/MangaList.py


I hope you liked this script. Please share this post in your social network :)

0 comments :

Post a Comment

Follow Me!

Blog Archive

Followers

Visitor Map