Introduction - Simplified Learning

What is Serialization?

The process of saving (or) writing state of an object to a file is called “Serialization“. But strictly speaking it is the process of converting an object from java supported form to either network supported form (or) file supported form.

Problems of Java Serialization

Default serialization mechanism provided in Java is not that efficient and has a host of well-known problems. Also the Java serialization doesn’t work very well if you want to share data with applications written in C++ or Python.

Once an instance of serializable class in serialized, and after that you make some change in class implementing default serialization, you won’t be able to deserialize that object from serialized data stream.

JVM generates serial version id for every serializable class and persist it with serialize stream So as to validate at the time of deserialization if JVM still have same complied version of class as it was at the time of serialization. As class has been changed later and that has changed the automatically generated version id, so id stored in stream does not match with id available in compiled class code and this mismatch causes InvalidClassCast exception.

So you need to be highly judicious to decide If implementing serializable is right for a class As making any change to that class, later, will cause problems in getting those persisted data stream back in JVM.

Serialization can be attributed to the resource overhead (both the CPU and the IO devices) that is involved in serializing and deserializing the data and the latency issues that are involved for transmitting the data over the network.

Further, serialization is quite slow. Moreover, XML serialization is insecure, consumes a lot of space on the disk and it works on public members and public classes and not on the private or internal classes. Therefore, it compels the developer to allow the class to be accessed to the outside world.

Since serialization does not offer any transaction control mechanisms, it is not suitable for use within applications needing concurrent access without making use of additional APIs

What is Google protocol buffers?

Google protocol buffers also known as protobuf is an efficient alternative to serialize objects. Protobuf is faster and simpler than XML and more compact than JSON. It was designed to be language/platform neutral and extensible. Currently, protobuf has support for C++, C#, Go, Java, and Python. In this tutorial we will see an introduction to Google Protocol Buffers (Protobuf) in Java.

Google Protocol buffers are an open source encoding mechanism for structured data, developed at Google. It is useful in developing programs to communicate with each other over a wire or for storing data. All you have to do is specify a message for each data structure you want to serialize (in a Java class like format) using a .proto specification file.

From that, the Google protocol buffers compiler (protoc) creates a class that implements automatic encoding and parsing of the protocol buffer data with an efficient binary format.

The generated java class provides setters and getters for the fields that make up a protocol buffer and takes care of the details of reading and writing the google protocol buffer as a unit. Importantly, the google protocol buffer format supports the idea of extending the format over time in such a way that the code can still read data encoded with the old format.

The protobuf API in Java is used to serialize and deserialize Java objects. You don’t need to worry about any encoding and decoding detail.

Advantages of Google Protocol Buffer

Data is fully typed.
Data is compressed automatically (less CPU usage)
Protocol Buffer is 3-10 times smaller than an XML.
Protocol Buffer is 10-100 times faster than an XML.
Generate data access classes that are easier to use programmatically.
Data can be read across any language (C#, Java, Go, Python, JavaScript, etc..)
Code is generated automatically for you.

Disadvantages of Google Protocol Buffer

Protobuf support for some languages might be lacking (but the main ones is fine)
Can’t open the serialized data with text editor (because it is compressed and serialized)

Note

Today Protocol buffers is used as Google for almost all their internal applications.
Google have over 48000 protobuf messages types in 12000 .proto files
If it’s working for google, there’s a great chance it’ll be working for you.

Introduction