Thursday, September 18, 2008

Get log files from S3 that were generated within a given timeframe

Amazon S3 log files are named as:

TargetPrefix-YYYY-MM-DD-HH-MM-SS-UniqueNumber

TargetPrefix is what we chose
YYYY-MM-DD-HH-MM-SS is the timestamp
UniqueNumber is something we should not care about

S3 returns a list of Objects (files) in S3 bucket if give give them the desired prefix.

Now, to get a list of log files that were created within a specific timeframe, we can use the following Algo:

beginTime : YYYY-MM-DD-10:25:15
endTime 13:00:00

1) get all files with prefix YYYY-MM-DD-10-25-16 to YYYY-MM-DD-10:25:59
2) get all files with prefix YYYY-MM-DD-10-26 to YYYY-MM-DD-10:59
3) get all files with prefix YYYY-MM-DD-12
3) get all files with prefix YYYY-MM-DD-13

One can further optimise the algorith to get files on 'day of the month' basis. But, that would not be a good idea because the list of files received from S# will be very long.

Following is a Java method for the implementation of the above logic


/**
* Retuns a List of targetprefix. List is from (beginTime + 1 second) to the
* hour value of endtime. Minute and second value of endTime is ignored.
*
* @param dbLogTime
* @param beginTime
* @return
*/
public List getPrefixList(Timestamp beginTime, Timestamp endTime, String targetPrefix) {
List prefixList = new ArrayList();

SimpleDateFormat sdfhr = new SimpleDateFormat("yyyy-MM-dd-HH");
SimpleDateFormat sdfmin = new SimpleDateFormat("yyyy-MM-dd-HH-mm");
SimpleDateFormat sdfsec = new SimpleDateFormat("yyyy-MM-dd-HH-mm-ss");

Calendar beginCal = Calendar.getInstance();
beginCal.setTimeInMillis(beginTime.getTime());
beginCal.set(Calendar.MILLISECOND, 0);

Calendar endCal = Calendar.getInstance();
endCal.setTimeInMillis(endTime.getTime());
endCal.set(Calendar.MINUTE, 0);
endCal.set(Calendar.SECOND, 0);
endCal.set(Calendar.MILLISECOND, 0);

if (endCal.compareTo(beginCal) < 0) {
logger.error("Error: Current Time cannot be less than the time in LAST_PROCESS_TIME. currentCal: "
+ sdfsec.format(endCal.getTime()) + ". lastProcessTime: " + sdfsec.format(beginCal.getTime()));
return prefixList;
}

beginCal.add(Calendar.SECOND, 1);
while (!(beginCal.get(Calendar.SECOND) == 0)) {
prefixList.add(targetPrefix + "-" + sdfsec.format(beginCal.getTime()));
beginCal.add(Calendar.SECOND, 1);
}

if (endCal.compareTo(beginCal) > 0) {
while (!(beginCal.get(Calendar.MINUTE) == 0)) {
prefixList.add(targetPrefix + "-" + sdfmin.format(beginCal.getTime()));
beginCal.add(Calendar.MINUTE, 1);
}
}

while (endCal.compareTo(beginCal) >= 0) {
prefixList.add(targetPrefix + "-" + sdfhr.format(beginCal.getTime()));
beginCal.add(Calendar.HOUR_OF_DAY, 1);
}

return prefixList;
}

Thursday, September 11, 2008

Ignore log4j logger statements for specific classes

We use certain 3rd party libraries and while development, we might want to log statements that has been generated from our application and not the third party library.

To set the log level of log requests from such libraries, put the following snippet in log4j.xml file.


<logger name="org.thirdpartlib">
<!-- Print only messages of level warn or above in the package <span style="font-style: italic;">thirdpartlib</span>-->
<level value="WARN"/>
</logger>



Replace org.thirdpartylib with appropriate value. That should be do the magic.

PS: org.thirdpartlib.* won't work.