5 Comments
User's avatar
Subrat Sahoo's avatar

Hey thanks for sharing your knowledge, this article was one of the best. Waiting for your next article.

Btw in the part that you mentioned about tokenisers, you used a library to break the sentence. In that case you mentioned you could have used the split method ?

Or can you suggest use cases for it ?

Expand full comment
Abhishek Raj's avatar

no, tokens length varies on AI models, suppose some has token length of 4

then "My name is Abhishek." will be ["My", "name", "is", "abhi", "shek", ".']

Though you can write your own function, if writing the AI algo from scratch.

Expand full comment
Subrat Sahoo's avatar

Got it

Expand full comment
Subrat Sahoo's avatar

Also I tried to use the code for tokenize, I needed to add nltk.download(‘punkt’) to it

Expand full comment
Abhishek Raj's avatar

yes, you will need to pip install nltk

Expand full comment