Thursday, September 18, 2008

Get log files from S3 that were generated within a given timeframe

Amazon S3 log files are named as:

TargetPrefix-YYYY-MM-DD-HH-MM-SS-UniqueNumber

TargetPrefix is what we chose
YYYY-MM-DD-HH-MM-SS is the timestamp
UniqueNumber is something we should not care about

S3 returns a list of Objects (files) in S3 bucket if give give them the desired prefix.

Now, to get a list of log files that were created within a specific timeframe, we can use the following Algo:

beginTime : YYYY-MM-DD-10:25:15
endTime 13:00:00

1) get all files with prefix YYYY-MM-DD-10-25-16 to YYYY-MM-DD-10:25:59
2) get all files with prefix YYYY-MM-DD-10-26 to YYYY-MM-DD-10:59
3) get all files with prefix YYYY-MM-DD-12
3) get all files with prefix YYYY-MM-DD-13

One can further optimise the algorith to get files on 'day of the month' basis. But, that would not be a good idea because the list of files received from S# will be very long.

Following is a Java method for the implementation of the above logic


/**
* Retuns a List of targetprefix. List is from (beginTime + 1 second) to the
* hour value of endtime. Minute and second value of endTime is ignored.
*
* @param dbLogTime
* @param beginTime
* @return
*/
public List getPrefixList(Timestamp beginTime, Timestamp endTime, String targetPrefix) {
List prefixList = new ArrayList();

SimpleDateFormat sdfhr = new SimpleDateFormat("yyyy-MM-dd-HH");
SimpleDateFormat sdfmin = new SimpleDateFormat("yyyy-MM-dd-HH-mm");
SimpleDateFormat sdfsec = new SimpleDateFormat("yyyy-MM-dd-HH-mm-ss");

Calendar beginCal = Calendar.getInstance();
beginCal.setTimeInMillis(beginTime.getTime());
beginCal.set(Calendar.MILLISECOND, 0);

Calendar endCal = Calendar.getInstance();
endCal.setTimeInMillis(endTime.getTime());
endCal.set(Calendar.MINUTE, 0);
endCal.set(Calendar.SECOND, 0);
endCal.set(Calendar.MILLISECOND, 0);

if (endCal.compareTo(beginCal) < 0) {
logger.error("Error: Current Time cannot be less than the time in LAST_PROCESS_TIME. currentCal: "
+ sdfsec.format(endCal.getTime()) + ". lastProcessTime: " + sdfsec.format(beginCal.getTime()));
return prefixList;
}

beginCal.add(Calendar.SECOND, 1);
while (!(beginCal.get(Calendar.SECOND) == 0)) {
prefixList.add(targetPrefix + "-" + sdfsec.format(beginCal.getTime()));
beginCal.add(Calendar.SECOND, 1);
}

if (endCal.compareTo(beginCal) > 0) {
while (!(beginCal.get(Calendar.MINUTE) == 0)) {
prefixList.add(targetPrefix + "-" + sdfmin.format(beginCal.getTime()));
beginCal.add(Calendar.MINUTE, 1);
}
}

while (endCal.compareTo(beginCal) >= 0) {
prefixList.add(targetPrefix + "-" + sdfhr.format(beginCal.getTime()));
beginCal.add(Calendar.HOUR_OF_DAY, 1);
}

return prefixList;
}

No comments: