The F - Technology

Tuesday, May 4, 2010

Deleting MySQL Database log files

We have a master database and its slave database running on different nodes. The master DB is subjected to HUGE no. of inserts/updates because of which the log files written by it grew by enormous proportions (log files at one time occupied more than 80% of the disk space).

Log files are important because using them we can track the changes made to the DB and in case of any DB crash, we can rebuild it again from the scratch. But in our case, we didn't require the log files and had the luxury to get rid of them instead of finding a reliable storage to keep them safe.

We decided to go ahead with deleting the log files. We automated this process by setting a flag in configuration file so that MySQL automatically deleted the log files which are older than 90 days.

This can be done as follows:

1) Stop the Master MySQL server.

2) Add the follwing parameter to the my.cnf file (here the figure on the right hand side is the no. of days after which the log file must be cleared)

expire_logs_days = 90

3) Start the server.

Simple !!!

Incase, you just need to delete the log files without automating the process then it can be done using the following command (use desired date value):

PURGE BINARY LOGS BEFORE '2010-05-03 14:56:23';

Since the replication node uses the log files, we need to make sure we only delete files that were created before the logfile that the slave node is currently reading. We can get this data by using the following command on the slave database:

show slave status;

This command gives the details about the slave. The value of the column 'Master_Log_file' is the name of the log file currently being read by the slave. Once we have the name of this file, we can get its creation date. Lets assume the creation date was '2010-05-03 14:56:23'. All log files created before this time can be deleted with the following command

PURGE BINARY LOGS BEFORE '2010-05-03 14:56:23';

References:
MySQL v5 documentation for purging log files

Thursday, July 2, 2009

Move MySQL database location

MySQL DB might using some location as data repository where hard-disk space is limited. This will warrant moving the data from that location to some other location which can provide more room to accomodate the increasing data. Following steps can move this data to the new location:

du -h : will show the memory parition of the current folder

--------------------

-->> Set the password for mysqlmanager (required if password is not set already).
mysqlmanager --passwd >> /etc/mysqlmanager.passwd

--> log into mysql manager
mysql -uroot -psh*pp@r --socket=/tmp/mysqlmanager.sock

-->> kill all instances of mysql
ps afx | grep mysql //get process ids
kill -9 4465 4473 4474 // kill those processes

-->> create the directory where you want to transfer the files
mkdir /mnt/mysql/marketingportal

-->> Give mysql the ownership of the directory and its contents (presently they are owned ny root)
chown -R mysql.mysql /mnt/mysql/ //mysql.mysql : group:user

--> Open the folder where the current data is
cd mysql-marketingportal/

--> Copy this data to the new folder
cp -R * /mnt/mysql/marketingportal/

-->> Give mysql the ownership of the new directory and its contents (presently they are owned ny root)
chown -R mysql.mysql /mnt/mysql/ //mysql.mysql : group:user

-->> Change the name of the old directory (optional - this is done to make sure that it isn't beign used any longer)
mv mysql-marketingportal/ mysql-marketingportal-old/

-->> edit the my.cnf file
change the value of datadir

-->> Start the mysqlmanager
nohup mysqlmanager --user=mysql > mysqlmanager.out &

Sunday, February 22, 2009

High Performance Database Tips

Some Database mantras discovered on the basis of practical experience.

Use the smallest datatype possible. eg. where a integer can take care of all possible states, don't use bigint. Every bit of extra space alloted will make the DB to grow in size without any actual utility, and there by taking toll on the performance
Choose what to index and how much to index. Indexing is creating a meta data of data. If data is being fetched on the basis of a particular attribute, make that attribute an index. It would will definitely boost-up the reads and updates. Also, keep in mind that indexing increases the insert time and these meta-data information do occupy disk space. Partial Indexing is a good way to create indexes not the on the complete field, but the deciding part of it. eg. Index can be created for the first few character's URLs if data to be fetched is on the basis of host name.
Choose storage engine wisely. eg. If the DB is not relational, go for MyISAM. Read the pros-cons of each engine before choosing one.
Controlled Data Redundancy. Higher the normalisation of a DB, lesser is the redundancy. But, one think to keep in mind is that retrieving data from such a DB might require joining of tables, which is a very expensive operation. If the size of DB is huge or if the data to be fetched is distributed across a significant no. of tables, then performance will take the hit. One has to make a educated choice between space and performance. My experience tells me that a bit of redudancy isn't that bad !
Fetch only relevant data. If you require some fields of a table(s), retrieve only those fields. Don't do a 'select *'.
Better programming logic: When programitically accessing DB use reuse connections. eg. If on the same DB schema, we need to fire 2 quries to retrive logical/related set of data, try using the same connection. Creating connection is costly. Also, Make sure to close all connections when use is over.
Don't run all servers on the same node. If application receives a huge no. of hits, and also tremendous DB operations are taking place in the DB (like huge no. of inserts/read/writes), its better to host the application on a node diffrerent from the DB. There is only so much a single node can handle.

Thursday, January 22, 2009

Write SQL query output on file

Query into outfile 'filename';

Query - query to be fired
filename - name of the file on which the data is to be written. Full path can also be given. Make sure that the folder is already in existence.

eg. select * from HOTEL_DETAILS into outfile 'C:/datafolder/sqldata.txt';

Further Reading:
http://gilfster.blogspot.com/2005/11/writing-sql-results-to-file.html

Wednesday, December 3, 2008

MySQL Table and Schema Size

I found these 2 commands for finding the size of a table or whole schema on MySQL mailing list (unable to recollect the URL)

Query To Find the size of a schema

#Query to find the size of a database schema (replace YOUR_SCHEMA_NAME with appropriate name)

#Run this query without selecting any schema

SELECT s.schema_name, CONCAT(IFNULL(ROUND((SUM(t.data_length)+SUM(t.index_length)) /1024/1024,2),0.00),"Mb") total_size, CONCAT(IFNULL(ROUND(((SUM(t.data_length)+SUM(t.index_length))-SUM(t.data_free))/1024/1024,2),0.00),"Mb") data_used, CONCAT(IFNULL(ROUND(SUM(data_free)/1024/1024,2),0.00),"Mb") data_free, IFNULL(ROUND((((SUM(t.data_length)+SUM(t.index_length))-SUM(t.data_free)) /((SUM(t.data_length)+SUM(t.index_length)))*100),2),0) pct_used, COUNT(table_name) total_tables FROM INFORMATION_SCHEMA.SCHEMATA s LEFT JOIN INFORMATION_SCHEMA.TABLES t ON s.schema_name = t.table_schema WHERE s.schema_name = "YOUR_SCHEMA_NAME" GROUP BY s.schema_name ORDER BY pct_used DESC\G;

-------------------------------------------------------

Query To Find the size of a table

# Query to find size of a particular table (replace YOUR_TABLE_NAME with appropriate name)

# Run this after choosing the schema

show table status like 'YOUR_TABLE_NAME'

Thursday, September 18, 2008

Get log files from S3 that were generated within a given timeframe

Amazon S3 log files are named as:

TargetPrefix-YYYY-MM-DD-HH-MM-SS-UniqueNumber

TargetPrefix is what we chose
YYYY-MM-DD-HH-MM-SS is the timestamp
UniqueNumber is something we should not care about

S3 returns a list of Objects (files) in S3 bucket if give give them the desired prefix.

Now, to get a list of log files that were created within a specific timeframe, we can use the following Algo:

beginTime : YYYY-MM-DD-10:25:15
endTime 13:00:00

1) get all files with prefix YYYY-MM-DD-10-25-16 to YYYY-MM-DD-10:25:59
2) get all files with prefix YYYY-MM-DD-10-26 to YYYY-MM-DD-10:59
3) get all files with prefix YYYY-MM-DD-12
3) get all files with prefix YYYY-MM-DD-13

One can further optimise the algorith to get files on 'day of the month' basis. But, that would not be a good idea because the list of files received from S# will be very long.

Following is a Java method for the implementation of the above logic


/**
 * Retuns a List of targetprefix. List is from (beginTime + 1 second) to the
 * hour value of endtime. Minute and second value of endTime is ignored.
 * 
 * @param dbLogTime
 * @param beginTime
 * @return
 */
public List getPrefixList(Timestamp beginTime, Timestamp endTime, String targetPrefix) {
 List prefixList = new ArrayList();

 SimpleDateFormat sdfhr = new SimpleDateFormat("yyyy-MM-dd-HH");
 SimpleDateFormat sdfmin = new SimpleDateFormat("yyyy-MM-dd-HH-mm");
 SimpleDateFormat sdfsec = new SimpleDateFormat("yyyy-MM-dd-HH-mm-ss");

 Calendar beginCal = Calendar.getInstance();
 beginCal.setTimeInMillis(beginTime.getTime());
 beginCal.set(Calendar.MILLISECOND, 0);

 Calendar endCal = Calendar.getInstance();
 endCal.setTimeInMillis(endTime.getTime());
 endCal.set(Calendar.MINUTE, 0);
 endCal.set(Calendar.SECOND, 0);
 endCal.set(Calendar.MILLISECOND, 0);

 if (endCal.compareTo(beginCal) < 0) {
  logger.error("Error: Current Time cannot be less than the time in LAST_PROCESS_TIME. currentCal: "
    + sdfsec.format(endCal.getTime()) + ". lastProcessTime: " + sdfsec.format(beginCal.getTime()));
  return prefixList;
 }

 beginCal.add(Calendar.SECOND, 1);
 while (!(beginCal.get(Calendar.SECOND) == 0)) {
  prefixList.add(targetPrefix + "-" + sdfsec.format(beginCal.getTime()));
  beginCal.add(Calendar.SECOND, 1);
 }

 if (endCal.compareTo(beginCal) > 0) {
  while (!(beginCal.get(Calendar.MINUTE) == 0)) {
   prefixList.add(targetPrefix + "-" + sdfmin.format(beginCal.getTime()));
   beginCal.add(Calendar.MINUTE, 1);
  }
 }

 while (endCal.compareTo(beginCal) >= 0) {
  prefixList.add(targetPrefix + "-" + sdfhr.format(beginCal.getTime()));
  beginCal.add(Calendar.HOUR_OF_DAY, 1);
 }

 return prefixList;
}

Thursday, September 11, 2008

Ignore log4j logger statements for specific classes

We use certain 3rd party libraries and while development, we might want to log statements that has been generated from our application and not the third party library.

To set the log level of log requests from such libraries, put the following snippet in log4j.xml file.


<logger name="org.thirdpartlib">
    <!-- Print only messages of  level warn or above in the package <span style="font-style: italic;">thirdpartlib</span>-->
      <level value="WARN"/>
</logger>

Replace org.thirdpartylib with appropriate value. That should be do the magic.

PS: org.thirdpartlib.* won't work.