English: Python logo Deutsch: Python Logo (Photo credit: Wikipedia) |
TextBlob is a Python (2 and 3) library for processing textual data. It provides a consistent API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, and more. TextBlob objects can be treated as if they were Python strings that learned how to do Natural Language Processing. TextBlob heavily depends on Python NLTK and pattern module by CLIPS. Corpora used by NLTK is the default corpora for TextBlob as well. For Installation insturctions Click Here
Today we will have an overview of this library in this part and our main focus will be to cover the different properties and methods of BaseBlob class.
First we import the TextBlob class which can be said as the most important class.
Basic's of TextBlob and Tokenization
First we import the TextBlob class which can be said as the most important class.
In [2]:
# We import the most important class TextBlob
from textblob import TextBlob
We will analyse and apply all the methods and functions of TextBlob on the following paragraph and sometimes with some additional sentences.
In [4]:
data = """
Hello, My name is Animesh Shaw and I am an undergraduate and studying Computer Science (upcoming Graduation in 2015). I Love programming and Computer science subjects of topics.
My field of interest include Computational Linguistics. I Love watching anime specially One Piece and Naruto Shippunden. Animes specially those two shows
an extravagant amount of dedication, passion love, and amibition towards achieving one's goal and aim's in life. These have always inpired me a lot.
Giving up on your own dreams to fulfill others and the same feeling that other carry along with friends or something which I call as an eternal bond.
I recommend everyone to watch Naruto and One Piece. I have learnt a lot from there. "People are not always born intelligent or powerfull but with hard work
great heights can be achieved in life." Yes its true, dedication and hard work are the key goals to success less than 1% people are born with the blessing
of being a prodigy. The world was shaken mostly by those non-prodigy people which have had a massive impact in every individuals lives.
With great goals and constant dedication and passion you can achieve
the unachievable.
"""
To use any functions or methods of TextBlob we first create a TextBlob object
In [5]:
tblob = TextBlob(data)
We will store all the words of the paragraph along with their POS tags in a variable tags. tags is a property in TextBlob class which returns a list of tuples. The tuple format being (, ). All strings or return values in TextBlob are unicode encoded.
In [6]:
tags = tblob.tags #We have stored the words of the text with the respective parts of speech tags
In [8]:
tags[:6] # Now that the tags are stored we will display the first 6 tags.
Out[8]:
NNP stands for proper noun. It is used for name, place, animals etc. etc. PRP stands for Pronoun.
Now lets prints all the tags which was earlier stored in tags variable. We will just print the first 20 tags.
In [12]:
for tag in tags[:20]:
print(str(tag[1]) + " ")
We can do the above by writing a single line of code too.
In [15]:
print("\n".join([tag[1] for tag in tags[:20]]))
Now let us have a look at all the tags in the data above
In [16]:
print(" ".join([tag[1] for tag in tags]))
In [17]:
# Now let us have a look at the total no of tags. We store all the tags in a variable named pos_tags
pos_tags = [tag[1] for tag in tags]
#Now we will print the length
print("No. of tags : " + str(len(pos_tags)))
In [21]:
#if you have noticed in entry no. 16 that a lot of tags are repeating. We would like to get all the unique tags from them.
#We can simply use he set() data structure to do so which will remove the duplicates.
unique_poses = set(pos_tags)
print(" ".join([ i for i in unique_poses ]))
print("\nNo of unique POS's : " + str(len(unique_poses)))
So now you can see that there are only 24 POS tags which have been used and the rest are just repetition. Using TextBlob we can even print all the noun phrases in the sentence.
In [23]:
# print all the noun phrases
tblob.noun_phrases
Out[23]:
We can get all the words as a WordList as well, by using the words property as follows which returns a list of all words as a class of WordList. WordList is a list-like collection of words. Its no different from Python lists but with additional methods.
In [26]:
tblob.words #returns the data as word tokenized form in a list.
Out[26]:
See its so easy. Noun Phrases gives us a lot of important and relevant information which can be further used to analyse the meaning. When we use the tblob.noun_phrases it returns the noun_phrases as an WordList which a class used to store words and manipulate them or operate with different functions etc. etc. Lets do something more.
Language Detection and Translation
Suppose that you want to detect the language used in the text above, TextBlob provides a detect_language() method to detect language used. The methods uses the Google Langauge Translate API for the purpose.
In [26]:
tblob.detect_language()
Out[26]:
Lets try some more and in different ways.
In [31]:
#Okay lets try some more.
TextBlob("Bonjour").detect_language()
Out[31]:
In [36]:
#Another one
TextBlob("Ciao").detect_language()
Out[36]:
In the last two "fr" stands for french and "it" stands for italian. Now lets move on. Now that you know that TextBlob can detect language. You might have a question whether it can even do the translation or not. As a matter of fact it can Lets take a simple example we will Convert "Thanks" in english to Japanese.
In [38]:
TextBlob("Thanks").translate(to="ja")
Out[38]:
You can see we got the translated text in Japanese.
Lets try another example with a bigger sentence
In [41]:
TextBlob("Hello, My name is Animesh P Shaw. I will become the Programming King").translate(to="fr")
Out[41]:
You might be having a doubt whether these returned values are true or not. Well you can always Google you know. Lets see if the last french translated sentence is detected as french or not.
In [42]:
TextBlob("Bonjour , Mon nom est Animesh P. Shaw . Je vais devenir le roi de programmation").detect_language()
Out[42]:
Ta da! The langauge of the above sentence has been detected as french since fr is the french langauge code. ## Raw Text Handling
Let's explore more and see what we have got. We will print the complete text as raw which means all the escape characters like or or * will also be printed. There is an builtin property for that purpose.
Let's explore more and see what we have got. We will print the complete text as raw which means all the escape characters like or or * will also be printed. There is an builtin property for that purpose.
In [44]:
tblob.raw
Out[44]:
raw_sentances is another property which returns a list of raw sentences which means all the escape characters like or or will also be printed
In [45]:
tblob.raw_sentences #
Out[45]:
Let us look at another property which is sentences. Now this is different from raw_sentences. The former will return a list of all the sentences of class Sentence(). We will have a look at it.
In [46]:
tblob.sentences
Out[46]:
Sentiment Analysis with TextBlob
TextBlob is specially helpful for Sentiment Analysis with all the built in methods and properties which you can modify by configuring and extend with different taggers or Analyzers.
What is Senitiment Analysis ?
Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials. With TextBlob we can see both the Polarity and Subjectivity of the information in a sentence or data.Now lets do something interesting and important. Note the following produces important results. We will now see how to measure the polarity of a sentence. Now what is polarity. Polarity is a measure which gives a numerical value depending on which we can understand whether a sentence is postive or negetive. Its more like someone says bad about you feel sad and it means negetive and when someone praises you, you feel joy which is positive polarity.
In [20]:
for sent in tblob.sentences:
print(sent.sentiment.polarity)
A value of 0.0 indicates neutral, 0.5 indicates positive. Note that the word "Love" indicates postiveness. Values which fall in between 0.4 and 0.5 are almost undecidatble or more or less positve. Let's consider the second last value 0.25, it is low because of the words shaken or massive impact which infuses a negetive sense.
Now lets display both the polarity and subjectivity. The sentiment property returns a namedtuple of the form Sentiment(polarity, subjectivity). The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
Now lets display both the polarity and subjectivity. The sentiment property returns a namedtuple of the form Sentiment(polarity, subjectivity). The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
In [21]:
for sent in tblob.sentences:
print(sent.sentiment)
Dumping Data Properties as JSON
Suppose that you want to get all the properties together as one in some format which is efficient and easy to parse. To solve such cases TextBlob provides a way to dump all the properties as a JSON file. For this example we will create a text blob instance with a smaller sentence "Nico Robin is the most sexy anime character I have ever encountered."
In [39]:
blob = TextBlob("Nico Robin is the most sexy anime character I have ever encountered.")
print(blob.json)
If you want to represent the JSON data in a serialized manner then you can do this in the following manner.
In [25]:
blob.serialized
Out[25]:
Now lets test something nice. Suppose that we add the following " and beautiful lady " after "sexy" in In [39] then what changes do you expect to happen. As you know that beautiful is a postive word and so what it does is it increases the polarity value. This technique can be used in different ways in research.
In [40]:
blob = TextBlob("Nico Robin is the most sexy and beautiful lady anime character I have ever encountered.")
print(blob.json)
Summary
So thats all we have came to the end of our first encounter with TextBlob. As you can see I have explained the stuffs in a very detailed manner. This is definatly in more depth than what has been covered in the official tutorials. Stay tuned for more and I will continue this series and explain almost everthing in the TextBlob library. Next time we will discuss about Spelling Corrections, N-Grams, Taggers and maybe lemmatization.Thank you for reading and I hope you have had a nice read.
Read this Article in IPython-Notebook format here.
0 comments :
Post a Comment