Here I am going to discuss some of the differences between Text and String class in Hadoop.
Text class lies in the package: import org.apache.hadoop.io.*;
Difference 1:
Text is not immutable : String is immutable
Text t = new Text("hadoop");
t.set("BigData")
print "t" --> prints "BigData"
Difference 2 :
Text stores the string in a byte buffer with UTF-8 unicode encoding
Example : Text t = new Tex("hadoop");
will get converted into byte[] array, and then places in to ByteBuffer.
so the string "hadoop" will get stored like this [UTF-CODE(h),UTF-code(a)........ UTF-code(p)]
so this this is the byte[] array representation for string "hadoop"
[104,97,100,111,111,112]
Why ?
Text uses standard UTF-8 which makes it potentially easier to inter-operate with other tools that
understand UTF-8.
Difference 3 :
CharAt(int index) in string returns the char at specified index.
charAt(int index) in Text returns the Unicode point in the above case it i 100.
Difference 4 : Due to lack of Rich API for manipulating strings in Text many cases we use to
convert it to String.
Difference 5 : Iterating over Text characters is tedious process when compared to string.
Example of iterating over charactes in Text;
Text t = new Text("hadoop");
ByteBuffer bf = ByteBuffer.wrap(t.getBytes(),0,t.getLength());
int cp;
while(bf.hasRemaining()){
cp = Text.bytesToCodePoint(bf);
System.out.print((char) cp);
}
Similarity 1 : find in Text equals to indexOf in String.
Text t = new Text("hadoop");
String s = new String("hadoop");
System.out.println(" >>> "+t.find("o"));
System.out.println(" >>> "+s.indexOf("o"));
Text class lies in the package: import org.apache.hadoop.io.*;
Difference 1:
Text is not immutable : String is immutable
Text t = new Text("hadoop");
t.set("BigData")
print "t" --> prints "BigData"
Difference 2 :
Text stores the string in a byte buffer with UTF-8 unicode encoding
Example : Text t = new Tex("hadoop");
will get converted into byte[] array, and then places in to ByteBuffer.
so the string "hadoop" will get stored like this [UTF-CODE(h),UTF-code(a)........ UTF-code(p)]
so this this is the byte[] array representation for string "hadoop"
[104,97,100,111,111,112]
Why ?
Text uses standard UTF-8 which makes it potentially easier to inter-operate with other tools that
understand UTF-8.
Difference 3 :
CharAt(int index) in string returns the char at specified index.
charAt(int index) in Text returns the Unicode point in the above case it i 100.
Difference 4 : Due to lack of Rich API for manipulating strings in Text many cases we use to
convert it to String.
Difference 5 : Iterating over Text characters is tedious process when compared to string.
Example of iterating over charactes in Text;
Text t = new Text("hadoop");
ByteBuffer bf = ByteBuffer.wrap(t.getBytes(),0,t.getLength());
int cp;
while(bf.hasRemaining()){
cp = Text.bytesToCodePoint(bf);
System.out.print((char) cp);
}
Similarity 1 : find in Text equals to indexOf in String.
Text t = new Text("hadoop");
String s = new String("hadoop");
System.out.println(" >>> "+t.find("o"));
System.out.println(" >>> "+s.indexOf("o"));
3 comments:
Being new to the blogging world I feel like there is still so much to learn. Your tips helped to clarify a few things for me as well as giving..
Hadoop Training in Chennai
Dot Net Training in Chennai
You have provided an nice article, Thank you very much for this one. And i hope this will be useful for many people.. and i am waiting for your next post keep on updating these kinds of knowledgeable things...
iOS App Development Company
Android App Development Company
Best Mobile app Development company
Android App Development Company in chennai
iOS App Development Company in chennai
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.great job you doing..thanks lot!!
android training in chennai
android online training in chennai
android training in bangalore
android training in hyderabad
android Training in coimbatore
android training
android online training
Post a Comment