Friday, February 7, 2014

The curious case of incorrect JAVA version

I have both JRE6 and JRE7 on my system. I had set the 'path' System Variable to point to JRE6, but whenever I executed 'java -version' on command prompt, java version was 7. Now this is a bit baffling !

Apparently, there was a java.exe file in my System32 folder. And this file was getting executed instead of the one pointed by 'path' variable. I deleted those files and things now function as expected.

Now, the question arises, what was java.exe file doing in System32 folder. I guess automatic java updates remotely done by the System Administrator in my company could be the reason.

I spent quite some time debugging this issue. This is what happens when somebody else messes with your dev machine without letting you know :-|

Interesting Links:

Thursday, October 10, 2013

String.split() issue in IE8

String.split() behaves differently in IE8 and Chrome/FF when the delimiter used is a regex. In Chrome/FF this method returns an array of all the tokens and the strings that qualified as delimiters in proper order. In IE8 it returns only an array of tokens (the delimiter strings are omitted)

For example

Input -->
String: The train starts from {%1}{%2n} at {%3} on {%4t, dd-MM-yyyy}
Delimiter Regex: ({.*?})

Output -->
Chrome/FF: ["The train starts from ", "{%1}", "", "{%2n}", " at ", "{%3}", " on ", "{%4t, dd-MM-yyyy}", ""]
IE8: ["The train starts from ", " at ", " on "]

To fix this IE8 anamoly I wrote a custom split method that returns delimiters strings as well. You can find it here: 

Its good for starters. Additional changes can be made to support regex modifiers.

BONUS: IE8 also doesn't support String.trim().  Add the following to your code to make it work
           // adding trim function if its not present (IE8 doesn't support trim)
           if(typeof String.prototype.trim !== 'function') {
                String.prototype.trim = function() {
                    return this.replace(/^\s+|\s+$/g, '');

Hope that helps :)


Tuesday, April 10, 2012

Garbage Collection in Java

Pulp Fiction:
This post will cover the basics of 'Garbage Collection' which is a part of Java's automated memory management. The matter presented here has been compiled from Oracle/Sun's documentations and various posts available freely on the internet. I have tried to present things here in a simplified manner and have willing left out some intrinsic details. I suggest after going through this post you should, atleast once, hear the story straight from the horse's mouth which can found here

Life is beautiful

One of the major reason I love Java is because of its automated memory management feature. As opposed to languages that do not have an automated memory management system in place, Java relieves the programmer from worrying about freeing allocated memory. This enables the programmer to concentrate on implementing the business logic rather than tracing the causes of those pesky memory leaks. Also, as the programmer is not allowed to fiddle directly with the memory blocks, he cannot accidently (or purposely) crash the system by incorrectly freeing allocated memory. This restriction ensures program integrity. Apart from this, some Garbage collectors also help combat heap fragmentation (not covered in this post).


One point to note here is that Garbage Collection activities consume the very same system resources that are meant to run the application. This may slow down the application or make it appear unresponsive when garbage collection is in progress. Also, since its automated, we do not have much say in regards to when it should take place. Fortunately, Garbage Collection algorithms have vastly improved overtime and hence slow/unresponsive behavior, if any, can largely be taken care of by appropriate heap space tuning and making use of the Garbage Collector which best suits our needs (more on this later in the post).

Some Definitions

Starting with basic definitions, objects that are no longer needed are called Garbage and the process of collecting and throwing away garbage to make space for new objects is called Garbage Collection. Process that does Garbage Collection is called Garbage Collector.

When does a non-garbage object become a garbage object?

The answer to the above question lies in the following four scenarios:
1) All references of that object is explicitly set to null (e.g. obj = null)
2) Object is created inside a block and reference goes out of scope
3) Parent object is set to null
4) If an object has only live references via WeakHashMap

How to identify garbage objects?

Naive Approach

The naive algorithm is to iterate over every reachable object and use a flag to mark the liveliness of that object. Objects that are left over are considered garbage and removed. This is a straight forward approach, but time taken to finish each cycle of garbage collection is proportional to the number of live objects. Hence, this approach will not scale up in applications that maintain a large set of live data. Can we do any better?

Generational Collection

It has been empirically observed in many programs that that most recently created objects are also those most likely to become unreachable(garbage) quickly. This means that objects have a very high infant mortality. Using this empirical observation, memory is managed in Java in generations i.e. objects of different ages are stored in different memory pools.

For simplicity sake, the memory here is divided into 2 parts. First part is called Young Generation (YG) and it holds newly created objects. Second part is called Tenured Generation (TG) and it holds objects that have been around for a while.

All the newly created objects are placed in the YG. When the YG becomes full, garbage collection takes place in YG. Garbage collection in YG is called Minor Collection. So, during minor collection all garbage objects in YG are removed. The live objects left in YG are then moved to the TG if they are old enough to be in TG else they remain in YG. This process goes on till TG gets filled, upon which Garbage collection in takes place in TG. This is called Major Collection. 

If minor collection takes place, but there is not sufficient space in TG to hold the tenured objects, Full Garbage Collection (FullGC) takes place which is simply minor+major collection.

A bit more about Minor Collection

Now, lets see in detail how minor collection (garbage collection in YG) works. The YG is divided into 3 parts - Eden, Survior1 (S1) and Survior2 (S2). All newly created objects are placed in Eden. Once Eden gets filled, minor collection takes place and garbage objects in Eden are removed. Remaining objects are moved to S1. Now, when the Eden gets filled again, minor collection takes place and all garbage objects in Eden and S1 are removed. Remaining objects from both the pools are moved to S2. Similarly, in the next minor collection live objects in Eden and S2 pools are again moved back to S1. Hence, objects keep oscillating between Survior1 and Survior2 till they come of age. Once they do, they are moved to TG in the next minor collection. Important point to note here is that at any point of time, atleast one of S1 or S2 will always be empty.

Metrics for Performance

There are different algorithms for garbage collection. Their efficacy is measured on the following criteria:
1) Throughput - Percentage of total time not spent in garbage collection
2) Pauses - Times when an application appears unresponsive because garbage collection is happening
3) Footprint - Working set of a process measured in pages and cache lines
4) Promptness - Time between when an object becomes dead and when the memory becomes available.

The desired qualities of a good Garbage Collection algorithm would be high Throughput and Promptness and low Pauses and Footprint. But depending on the use-case, one factor may be traded for the other to achieve desired output. Eg. 
Batch Processing Jobs: Aim is to finish the job in the least possible time even if it requires long pauses to achieve that. So here high throughput is given more preferences than low pauses.
Real Time Processing: Here pauses should low even if overall throughput is not at its optimum best.

Now, lets see if size of different generations can affect the above mentioned metrics (we can set the size of generations using the command line parameter -XX:NewRatio. For example, -XX:NewRatio=3 means YG:TG = 1:3 i.e. size of YG is a quarter of the total heap size).

Size of 'Young Generation' is large:

If size of YG is large, then frequency of minor collections will be less. This will hence increase the throughput but at the expense of footprint, promptness and pause times. Also, if the heap is of fixed size, then a large YG implies a smaller TG. And a smaller TG would lead to an increase in the frequency of major collections.

Size of 'Young Generation' is small:

A small YG will minimize pauses at the expense of throughput.

From the above analysis we can infer that there is no one right way to size the generations! The best possible bet is to let JVM take care of the size or you play around with different generation sizes and see what suits your application the best.

Now that we have covered the basics of Garbage Collectors, we will now briefly look into 3 types of collectors that are shipped with JVM. But before that we need to understand one more important feature called Young Generation Guarantee.

Young Generation Guarantee

Young Generation Guarantee means that TG must have enough free memory to accommodate all the live objects. Let me illustrate this with the help of a simple example. Suppose we run an application that creates objects none of which ever become garbage. After the Eden space is filled, minor collection will oscillate the objects between S1 and S2 till they come of age and are eventually moved to TG. Hence, TG must be large enough to accommodate all the live objects, which means TG must be atleast as large as the combined space occupied by Eden and the larger of the two survivor spaces. Garbage Collectors which provide a Young Generation Guarantee are called Young Generation Collectors.

Different Types of Garbage Collectors

1) Serial Collector

This garbage collector provides Young Generation Guarantee and is hence a Young Generation Collector. Prior to J2SE 5.0, JVM used serial collector by default for both minor and major collections. For the later versions, JVM became smart enough to choose the collectors based on the class of machine on which the application has been started.

2) Throughput Collector

In Throughput Collector, minor collections happens using a parallel version of a Young Generation Collector while the major collection happens using a Serial Collector.
Throughput Collector is a good choice for an application that is running on a large no. of processors since minor collection is done by multiple threads.

3) Concurrent Low Pause Collector

In Concurrent Low Pause Collector, minor collection happens using a parallel version of a Young Generation Copying Collector while most of the major collection happens concurrently with the execution of the application. So, if an application is using 10 processors, during major collection the application can continue using 9 processors, while the remaining one processor can be used for major collection. The point to note here is that Concurrent Low Pause Collector can be used only if application can share processor resources with the garbage collector when the application is running.

Since garbage collection takes place concurrently with the application, Concurrent Low Pause Collector is a good choice for applications that can benefit from shorter garbage collector pauses. e.g. Interactive applications. Also, it makes for a good choice when the application has a relatively large set of long-lived data and is running on a machine with multiple processors.

There are quite a few other Garbage Collector implementations details of which are beyond the scope of this article.


  1. Tuning Garbage Collection with the 5.0 Java[tm] Virtual Machine 
  2. Java's garbage-collected heap 
  3. How Garbage Collection works in Java
  4. Taming the Java Garbage Collector
  5. Java Garbage Collection Distilled

Tuesday, May 4, 2010

Deleting MySQL Database log files

We have a master database and its slave database running on different nodes. The master DB is subjected to HUGE no. of inserts/updates because of which the log files written by it grew by enormous proportions (log files at one time occupied more than 80% of the disk space).

Log files are important because using them we can track the changes made to the DB and in case of any DB crash, we can rebuild it again from the scratch. But in our case, we didn't require the log files and had the luxury to get rid of them instead of finding a reliable storage to keep them safe.

We decided to go ahead with deleting the log files. We automated this process by setting a flag in configuration file so that MySQL automatically deleted the log files which are older than 90 days.

This can be done as follows:

1) Stop the Master MySQL server.

2) Add the follwing parameter to the my.cnf file (here the figure on the right hand side is the no. of days after which the log file must be cleared)

expire_logs_days = 90

3) Start the server.

Simple !!!

Incase, you just need to delete the log files without automating the process then it can be done using the following command (use desired date value):

PURGE BINARY LOGS BEFORE '2010-05-03 14:56:23';

Since the replication node uses the log files, we need to make sure we only delete files that were created before the logfile that the slave node is currently reading. We can get this data by using the following command on the slave database:

show slave status;

This command gives the details about the slave. The value of the column 'Master_Log_file' is the name of the log file currently being read by the slave. Once we have the name of this file, we can get its creation date. Lets assume the creation date was '2010-05-03 14:56:23'. All log files created before this time can be deleted with the following command

PURGE BINARY LOGS BEFORE '2010-05-03 14:56:23';

MySQL v5 documentation for purging log files

Thursday, July 2, 2009

Move MySQL database location

MySQL DB might using some location as data repository where hard-disk space is limited. This will warrant moving the data from that location to some other location which can provide more room to accomodate the increasing data. Following steps can move this data to the new location:

du -h : will show the memory parition of the current folder


-->> Set the password for mysqlmanager (required if password is not set already).
mysqlmanager --passwd >> /etc/mysqlmanager.passwd

--> log into mysql manager
mysql -uroot -psh*pp@r --socket=/tmp/mysqlmanager.sock

-->> kill all instances of mysql
ps afx | grep mysql //get process ids
kill -9 4465 4473 4474 // kill those processes

-->> create the directory where you want to transfer the files
mkdir /mnt/mysql/marketingportal

-->> Give mysql the ownership of the directory and its contents (presently they are owned ny root)
chown -R mysql.mysql /mnt/mysql/ //mysql.mysql : group:user

--> Open the folder where the current data is
cd mysql-marketingportal/

--> Copy this data to the new folder
cp -R * /mnt/mysql/marketingportal/

-->> Give mysql the ownership of the new directory and its contents (presently they are owned ny root)
chown -R mysql.mysql /mnt/mysql/ //mysql.mysql : group:user

-->> Change the name of the old directory (optional - this is done to make sure that it isn't beign used any longer)
mv mysql-marketingportal/ mysql-marketingportal-old/

-->> edit the my.cnf file
change the value of datadir

-->> Start the mysqlmanager
nohup mysqlmanager --user=mysql > mysqlmanager.out &

Sunday, February 22, 2009

High Performance Database Tips

Some Database mantras discovered on the basis of practical experience.
  • Use the smallest datatype possible. eg. where a integer can take care of all possible states, don't use bigint. Every bit of extra space alloted will make the DB to grow in size without any actual utility, and there by taking toll on the performance
  • Choose what to index and how much to index. Indexing is creating a meta data of data. If data is being fetched on the basis of a particular attribute, make that attribute an index. It would will definitely boost-up the reads and updates. Also, keep in mind that indexing increases the insert time and these meta-data information do occupy disk space. Partial Indexing is a good way to create indexes not the on the complete field, but the deciding part of it. eg. Index can be created for the first few character's URLs if data to be fetched is on the basis of host name.
  • Choose storage engine wisely. eg. If the DB is not relational, go for MyISAM. Read the pros-cons of each engine before choosing one.
  • Controlled Data Redundancy. Higher the normalisation of a DB, lesser is the redundancy. But, one think to keep in mind is that retrieving data from such a DB might require joining of tables, which is a very expensive operation. If the size of DB is huge or if the data to be fetched is distributed across a significant no. of tables, then performance will take the hit. One has to make a educated choice between space and performance. My experience tells me that a bit of redudancy isn't that bad !
  • Fetch only relevant data. If you require some fields of a table(s), retrieve only those fields. Don't do a 'select *'.
  • Better programming logic: When programitically accessing DB use reuse connections. eg. If on the same DB schema, we need to fire 2 quries to retrive logical/related set of data, try using the same connection. Creating connection is costly. Also, Make sure to close all connections when use is over.
  • Don't run all servers on the same node. If application receives a huge no. of hits, and also tremendous DB operations are taking place in the DB (like huge no. of inserts/read/writes), its better to host the application on a node diffrerent from the DB. There is only so much a single node can handle.

Thursday, January 22, 2009

Write SQL query output on file

Query into outfile 'filename';

Query - query to be fired
filename - name of the file on which the data is to be written. Full path can also be given. Make sure that the folder is already in existence.

eg. select * from HOTEL_DETAILS into outfile 'C:/datafolder/sqldata.txt';

Further Reading: