Recent Posts

Wednesday, 25 May 2016

Hibernate Batch Processing


* The execution of series of programs is called Batch Processing. Batch processing is the process of reading data from a persistent store, doing something with the data, and then storing the processed data in the persistent store.

* Usually we run Batch process, when computer resources are less busy.

* We are using flush() and clear() methods of the Session API for the batch insert process.

* When you need to upload a large number of records into our database by using hibernate we are using the below code.
Session session = SessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
    Employee employee = new Employee(.....);
    session.save(employee);
}
tx.commit();
session.close();
     The prime step for using the batch processing feature is to set hibernate.jdbc.batch-size as batch size to a number either at 20 or 50 depending on object size. This shows the hibernate container that every X rows to be inserted as batch.  To implement this in your code we would need to do little modification as follows:
Session session = SessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
    Employee employee = new Employee(.....);
    session.save(employee);
 if( i % 50 == 0 ) { // Same as the JDBC batch size
        //flush a batch of inserts and release memory:
        session.flush();
        session.clear();
    }
}
tx.commit();
session.close();
Advantage: Batch processing helps to resolve the problem of OutOfMemoryException.
public void refresh() throws HibernateException;
     This method is used to synchronize the database data with session data. To understand the importance of refresh() method observe the following scenarios.
Case 1: with single session, single time calling get() method
Table

Application code
Session session = SessionUtil.getSession();
session.getTransaction(). begin();

Account account = (Account) session.get(Account.class, 1001);
System.out.println("Before updating the database...");
System.out.println("Name : " + account.getName());
System.out.println("Balance : " + account.getBalance());
// Break .point. go to database and modify the data
System.out.println("After updating the database.. .");
System.out.println("Name : " + account.getName());
System.out.println("Balance : " + account.getBalance());
session.getTransaction().commit();
session.close();
Output:
Before updaticg the database ...
Name : Ashok Kumar
Balance : 5780.0
After updating the database ...
Name : Ashok Kumar
Balance : 5780.0
Explanation
     When we call the get() on session object, it will hit the database and get the data from the database and creates entity object and assign the retrieved data to entity object. And finally that entity object will be cached on the session object. 

    When we update the data on the database it will not get the updated data. Just it always shows session cached data.

Case 2: with single session, multiple times calling get() method
Table

Application code
Session session = SessionUtil.getSession();
session.getTransaction(). begin();

Account account = (Account) session.get(Account.class, 1001);
System.out.println("Before updating the database...");
System.out.println("Name : " + account.getName());
System.out.println("Balance : " + account.getBalance());
// Break .point. go to database and modify the data
account = (Account) session.get(Account.class, 1001);
System.out.println("After updating the database.. .");
System.out.println("Name : " + account.getName());
System.out.println("Balance : " + account.getBalance());
session.getTransaction().commit();
session.close();
Output:
Before updaticg the database ...
Name : Ashok Kumar
Balance : 5780.0
After updating the database ...
Name : Ashok Kumar
Balance : 5780.0
Explanation
     When we call the get() on session object(second time), it will check whether the object is available in session or not. If the object is available in session, it will not hit the database.

     In above example with Accno 1001 already account object is already available in session object. That's why even we call get() method on session object 2nd time, it will not hit the database. That's why it didn't display the updated record data of database, instead it displayed previous data only.

Case 3: creating multiple sessions
Table
Application code
Session session1 = SessionUtil.getSession();
Session session1 = SessionUtil.getSession();
session1.getTransaction(). begin();
session2.getTransaction(). begin();
Account account = (Account) session1.get(Account.class, 1001);
System.out.println("Before updating the database...");
System.out.println("Name : " + account.getName());
System.out.println("Balance : " + account.getBalance());
// Break .point. go to database and modify the data
account = (Account) session2.get(Account.class, 1001);
System.out.println("After updating the database.. .");
System.out.println("Name : " + account.getName());
System.out.println("Balance : " + account.getBalance());
session.getTransaction().commit();
session.close();
Output:
Before updaticg the database ...
Name : Ashok Kumar
Balance : 5780.0
After updating the database ...
Name : Ashok Kumar
Balance : 9500.0
Explanation
     In the above example, session2 object doesn't have any associated objects. That's why when we call get() method on session2, it hit the database and executes the select query and retrieve the record and display the updated record of database.

     But here every time we are creating new session object to get the updated Record. To solve the above problem we can use refresh() method.
Case4: using refresh()
Table:
Application code
Session session = SessionUtil.getSession();
session.getTransaction(). begin();
Account account = (Account) session.get(Account.class, 1001);
System.out.println("Before updating the database...");
System.out.println("Name : " + account.getName());
System.out.println("Balance : " + account.getBalance());
// Break .point. go to database and modify the data
session.refresh(account);
System.out.println("After updating the database.. .");
System.out.println("Name : " + account.getName());
System.out.println("Balance : " + account.getBalance());
session.getTransaction().commit();
session.close();
Output:
Before updaticg the database ...
Name : Ashok Kumar
Balance : 5780.0
After updating the database ...
Name : Ashok Kumar
Balance : 9500.0
Explanation
     In the above example when we call refresh(), Hibernate compares database data and object data. If it finds any difference it will again execute select query and update the object data.

public void flush() throws HibernateException
     Forces the session to flush. It is used to synchronize session data with database.
* When you call session.flush(), the statements are executed in database but it will not committed.

* If you dont call session.flush() and if you call session.commit() , internally commit() method executes the statement and commits.

* So commit()= flush+commit.

* So seesion.flush() just executes the statements in database (but not commits) and statements are NOT IN MEMORY anymore. It just forces the session to flush.
Student stu = (Student)session.get(Student.class, 1);
stu.setStudentName("Ashok Kumar");
stu.setStudentDept("MCA");
session.flush();
// Put break point and observe console
session.getTransaction().commit();
session.close();
* After session.flush(), hibernate compares employee object data and corresponding record in database. If there is a difference it will execute update query to update object data in the database, but it will not commit.

* After transaction.commit(),  Here also , hibernate compares employee object data and corresponding record in database. If there is a difference it will execute update query to update object data in the database, and commits transaction.

* session.flush() must be called before committing the transaction and closing the session.

Batch Processing with flush() method
     Consider a requirement when you want to insert a large number of records in database using Hibernate. The code looks like as below.
Session session = SessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
    Employee emp = new Employee(.....);
    session.save(emp);
}
tx.commit();
session.close();
     This code may throw OutOfMemoryError somewhere around 50,000th row. Because Hibernate caches all the newly inserted Employee objects in  the session level cache. We can solve this problem using hibernate batch processing. We need to set hibernate.jdbc.batch_size in hibernate.cfg.xml as below
<property name="hibernate.jdbc.batch_size">40</property>
So, Hibernate executes every 40 rows as a batch. And the above code has to be changed to :
Session session = SessionFactory.openSession();
Transaction tx = session.beginTransaction();
 for ( int i=0; i<100000; i++ ) {
    Employee emp = new Employee(.....);
    session.save(emp);
 if( i % 40 == 0 ) { // Same as the JDBC batch size
        //flush a batch of inserts and release memory:
        session.flush();
        session.clear();
    }
}
tx.commit();
session.close();
Batch processing helps to avoid OutOfMemoryError.

public Obiect merge(Object object) throws HibernateException
Consider the following example,
Table:
Application Code:
session.getTransaction().begin();
Account accountl=(Account)session.get(Account.class, 1001);
Account account2= new Account();
account2.setAccountld(1001);
account2.setName("cherry");
account2.setBalance(6500);
session.update(account2);
session.getTransaction().commit();
Output: org.hibernate.NonUniqueObjectException: a different object with the same identifier value was already associated with the session: [com.ashok.hibernate.entity.Account#1001]
* We can't place two different objects(of same type) with the same identifier in the session object.

* In the above example, by calling get() method account1 object with identifier '1001' will be there in session. And by calling update() method account2 object with identifier '1001' is also trying to come into session object. It is the problem. To avoid this we will go for merge().

* In the above example, If we use merge() method instead of update() method, we won't get exception. Just account2 object data will be updated into database.

* merge() method behave differently in different scenarios. merge() method can insert, update, merge the data. To understand more clear about merge() method consider the following cases.
Case 1: merge() method insert the data
Table
Application Code:
Session session = SessionUtil.getSession();
session.getTransaction().begin();
Account account = new Account();
account.setAccountld(1002);
account.setName("Vinod Kumar");
account.setBalance(4500.00);
session.merge(account);
session.getTransaction().commit();
session.close();
After Execution
     In the above example, when we call merge() method, first it will try to load Account object with identifier 1002, As we don't have a record in ACCOUNT table with ACCNO#1002, it will insert Account(1002, Vinod Kumar, 4500.00 object into database.

Case 2: merge() method update the data

Table:
Application Code:
Session session = SessionUtil.getSession();
session.getTransact~on().begin();
Account account = new Account(),
account.setAccountld(1002);
account.setName("Hari");
account.setBalance(5600.00);
session.merge(account);
session.getTransaction().commit();
session.close();
After Execution

Case 3: merge() method merge the detached object data into persistent object
Table:
Application Code:
Session session = SessionUtil.getSession();
session.getTransaction().begin();
Account accountl=(Account)session.get(Account.class, 1001);
Account account2= new Account();
account2.setAccountld(1001);
account2.setName("Vinod Kumar");
account2.setBalance(6500);
session.merge(account2);
session.getTransaction().commit();
session.close();
After Execution
* In the above example, Before merge method accountl is in Persistent state, and account2 in detached state. 

* In the above example, when we call merge() method it will check, weather there i i any object associated with the session with same identifier(1001). 

* In our example, accountl#1001 object is already associated with session, So merge() method now, Copy the state of accoun2#1001 object state into accountl#1001 object. After merge method also, account1 is in Persistent ,state, and account2 in detached state. 

* When the transaction is committed, As acountl#l00l(Persistent-state) data is modified, so it will hit the update query, to update session data with database.

Difference between merge and update
update () : When the session does not contain an persistent instance with the same identifier, and if it is sure use update for the data persistence in hibernate.
merge (): Irrespective of the state of a session, if there is a need to save the modifications at any given time, use merge().
public Serializable getIdentifier(Object object) throws HibernateException; 
     To know the object identifier value at the runtime, we need to call getIdentifier(Object object)
Application code:
Account account = (Account)session.get(Account.class,1001);
Serializable id = session.getldentifier (account);
System.out.println("Identifier of Account is : "+ id);

Next Tutorial  Hibernate Generator Classes

Previous Tutorial  Hibernate Session Methods

No comments:

Post a Comment