Wednesday, 29 August 2012

array of file handles in Perl

I used the following code today which creates, prints to and closes an array of file-handles using the IO::File core module. I had over 100K of files in the same directory and had to process them (and send them into their own directory too, but that is trivial). This works on Perl 5.14...
#==========================
# Andrew Cave
# 29 aug 2012
# 
#==========================

use IO::File;
#create a blank hash of day values
my %month=( 
'20120712' => '','20120713' => '','20120714' => '',
'20120715' => '','20120716' => '','20120717' => '',
'20120718' => '','20120719' => '','20120720' => '',
'20120721' => '','20120722' => '','20120723' => '',
'20120724' => '','20120725' => '','20120726' => '',
'20120727' => '','20120728' => '','20120729' => '',
'20120730' => '','20120731' => '','20120801' => '',
'20120802' => '','20120803' => '','20120804' => '',
'20120805' => '','20120806' => '','20120807' => '',
'20120808' => '','20120809' => '','20120810' => '',
'20120811' => '','20120812' => '','20120813' => '',
'20120814' => '','20120815' => '','20120816' => '',
'20120817' => '','20120818' => '','20120819' => '',
'20120820' => '','20120821' => '','20120822' => '',
'20120823' => '','20120824' => '','20120825' => '',
'20120826' => '','20120827' => '','20120828' => '' 
);
my ($key,$val) ;
#open a filename based on the array values and
#put the file-handle reference in the hash, using the dayvalue 
#as the key

while (  ($key,$val) = each(%month)) {
 my $filename = "output_$key.dat";
 my $fh = IO::File->new("> $filename") 
      or die "failed to create $filename $@\n$!\n";
 $month{$key} = $fh;
};

open IN , '<', 'input.dat';
while(){
 chomp;
 my $line = $_;
 if(/\d{8}/){# if the line has an eight digit number
   my $day = $1; 
   my $fh = $month{$day}; # get the correct filehandle
   print {$fh} "$line\n" ; #print to the filehandle - note the braces
                #move the file into its own directory
                #and add '.done' to the name
  if(! -d "./$day/"){
   system("mkdir ./$day/");
  }
  #print "$file|./$day/\n"; # the mv statements (if we want a record)
  system("mv $file ./$day/$file.done");

 }

}
close IN;

#close the array of files we created
# using IO::File
while (($key,$val)=each(%month)){
 my $close_fh =  $month{$key};
 $close_fh->close();
}

Wednesday, 28 March 2012

Create a Calendar listing in pure Oracle SQL

Just fiddling around...create a calendar for the next 1000 days starting with the 1st day of the current year. The calendar is the standard format putting Sundays in the left-most column a la :

2012 APR SUN MON TUE WED THU FRI SAT
2012 APR 1 2 3 4 5 6 7
2012 APR 8 9 10 11 12 13 14
2012 APR 15 16 17 18 19 20 21
2012 APR 22 23 24 25 26 27 28
2012 APR 29 30
2012 AUG SUN MON TUE WED THU FRI SAT
2012 AUG 1 2 3 4
2012 AUG 5 6 7 8 9 10 11
2012 AUG 12 13 14 15 16 17 18
2012 AUG 19 20 21 22 23 24 25
2012 AUG 26 27 28 29 30 31
2012 DEC SUN MON TUE WED THU FRI SAT
2012 DEC 1
2012 DEC 2 3 4 5 6 7 8
2012 DEC 9 10 11 12 13 14 15
2012 DEC 16 17 18 19 20 21 22
2012 DEC 23 24 25 26 27 28 29
2012 DEC 30 31




/*generate a calendar in Oracle using pure SQL
-- Andrew Cave andrew.cave.blogging at gmail-- 29/3/2012*/
with x as (select -- generate the days
  trunc(trunc(sysdate,'YEAR')+level-1) date_ -- add days to 1st day of current year
  from dual connect by level <= 1000 -- number of days to go forward
) 
select yr,monthname,SUN,MON,TUE,WED,THU,FRI,SAT from 
(
select yr
,monthname,MON_NUM,row_
,SUN,MON,TUE,WED,THU,FRI,SAT
from (
select -- display the dates
yr
,monthname,MON_NUM,row_
, to_char(max(decode(dayname,'SUN',dom,null))) SUN
, to_char(max(decode(dayname,'MON',dom,null))) MON
, to_char(max(decode(dayname,'TUE',dom,null))) TUE
, to_char(max(decode(dayname,'WED',dom,null))) WED
, to_char(max(decode(dayname,'THU',dom,null))) THU
, to_char(max(decode(dayname,'FRI',dom,null))) FRI
, to_char(max(decode(dayname,'SAT',dom,null))) SAT
from (
select 
yr
,mon monthname
,dayname
,dom
,MON_NUM
,trunc((dom+ fdom-1)/7)-trunc(fdom/7)  row_ -- what week of the month
,fdom 
from 
  (
  select date_ -- get date values for calculations and display
    , to_char(date_,'DY') dayname
    , to_char(date_,'YYYY') yr  
    , to_char(date_,'MON') MON
    , to_char(date_,'MM') MON_NUM
    , to_number(to_char(date_,'DD')) dom
    , to_number(to_char(trunc(date_,'MM'),'D')) fdom -- day value of 1st day of month
  from x
  ) x1
) x2 group by yr,monthname,row_,MON_NUM
) x3 
union
select distinct -- this select does the day name column headers for the calendar
  to_char(date_,'YYYY') yr
  , to_char(date_,'MON') MONthname
  , to_char(date_,'MM') MON_NUM
  , -1 row_
  ,'SUN','MON','TUE','WED','THU','FRI','SAT'
from x
) x4  order by yr, MON_NUM, row_

Wednesday, 21 March 2012

Building a mysqldump script to back up your database

Sometimes, I want to backup the mySQL database but not everything...using mysqldump we can develop a little script to organise what we need.

select
concat('mysqldump -u<uid> -p<passwd> --max_allowed_packet=256M ',
table_schema,' ', group_concat(table_name ORDER BY table_name DESC SEPARATOR ' ')
,' > /tmp/mysqldump_20120319_',table_schema
) tables_ from information_schema.TABLES where table_schema not in ('information_schema','mysql','db1','db2','db2','db3')
and table_type='BASE TABLE'
group by table_schema


This returns a series of command strings for mysqldump that lists each table in the non-excluded schemas and dumps them into a dump file for each schema. This means a table name can be deleted if you don't want/need to have it backed-up enough to justify storing a (e.g) 20GB insert statement.

These can be pasted into a file and run with sh and cron ....or in a batch file in Windows.

Adjust the UID, pwd, dumpfile name/location and database schemas as needed.

Monday, 12 March 2012

Restrict number of rows returned - Sybase IQ

Often when you are trying to figure out a database, you will look at a table just to see what is in it. Some SQL tools return every single row on a "select * from table" COUGH!*TOAD for Data Analysts*!COUGH which is a pain when you have 3 million rows.

In Oracle you can use ROWNUM < 100 to return the first 99 rows and in MySQL 'limit 99' does the same. But for Sybase IQ? Nothing so simple. But today I discovered that there is a way to do it using the ROWID function:
select * from <SCHEMA>.<TABLE_NAME> where rowid("<TABLE_NAME>") < 100

Thursday, 8 March 2012

Business Days between 2 dates in Sybase IQ

This is just ridiculously complex in Sybase compared to the others. But it CAN be done:





-- get the work days between the mondays follwoing each date
abs(datediff(day,(dateadd(dd,1,dateceiling(wk,dt1)) ), (dateadd(dd,1,dateceiling(wk,dt2)) ))) -  round(abs(datediff(day,(dateadd(dd,1,dateceiling(wk,dt1)) ) , (dateadd(dd,1,dateceiling(wk,dt2)) )))/7,0)*2 +
--add on the working days to the following monday for the first date
case when (5 - case when cast(dateformat(dt1,'d') as numeric) = 1 then 7 else  cast(dateformat(dt1,'d') as numeric)-1 end ) < 0 
        then 0 else (5 - case when cast(dateformat(dt1,'d') as numeric) = 1 then 7 else  cast(dateformat(dt1,'d') as numeric)-1 end ) end -
--take off the working days to the follwoing monday for the second date   
   case when (5 - case when cast(dateformat(dt2,'d') as numeric) = 1 then 7 else  cast(dateformat(dt2,'d') as numeric)-1  end) < 0 
        then 0 else  (5 - case when cast(dateformat(dt2,'d') as numeric) = 1 then 7 else  cast(dateformat(dt2,'d') as numeric)-1  end) end biz_days_between


Replace dt1 and dt2 with your dates. Make dt1 < dt2 otherwise I think it will break. I didn't bother testing because coding this took away my will to live.

Tuesday, 6 March 2012

MySQL useful dates

You get asked weird things in reporting, but the worst are translating the nice, neat, sensible dates in database tables into the strange ways that people think about dates.

Here is some MySQL SQL that returns some useful relative dates:
  • start of the week
  • start of last week
  • start of the month
  • end of previous month
  • start of the previous month


select
date(@dt) base_date,
date(@dt) - interval weekday(@dt) day week_start,
date(@dt) - interval weekday(@dt) day - interval 1 week last_week_start,
date(@dt) - interval (dayofmonth(@dt)-1) day month_start,
date(@dt) - interval (dayofmonth(@dt)) day end_prev_mth,
(date(@dt) - interval (dayofmonth(@dt)-1) day) - interval 1 month start_prev_mth
from
(select @dt := now()) x

Beginning of financial year in Oracle

In Australia, the financial year runs from 1 July to 30 June. When doing financial reports, sometimes you need to know the financial year current at the time. Oracle (amazingly) makes it easy...


select
case
when mth < 4 then trunc(add_months(dt,-6),'Q') when mth < 7 then trunc(add_months(dt,-9),'Q') when mth < 10 then trunc(dt,'Q') else trunc(add_months(dt,-3),'Q') end fin_year /*return the beginning of the current financial year*/ from (select dt, to_number(to_char(dt ,'MM')) mth from ( select sysdate dt from dual) xx ) xx1

Business days between 2 dates in Oracle

A request I get occasionally is how many business days between 2 arbitrary dates. Oracle is, of course, not at all helpful here. So here is SQL to do it for you (2 methods). The second method relies on 'CONNECT BY' syntax being available so not less than Oracle 10 I'm afraid. The other one is more complex, but should work on any version.

The 2 different methods give slightly different results when the start dates are on Sat/Sun - differing interpretation of when you start counting business days from. The second method is a bit of a hack and only covers 18 months in the past and 18 months in the future. To expand that, change the 500 to the number of days ago you want to start from and the 1000 to the number of days you need to go forward from there. Ridiculously complex I know. but most of it is handling weekend dates.



select
abs( case when to_char(dt2,'d') > 5 then trunc(dt2,'d')+7 else dt2 end -
case when to_char(dt1,'d') > 5 then trunc(dt1,'d')+7 else dt1 end ) -
abs(trunc((trunc( case when to_char(dt2,'d') > 5 then trunc(dt2,'d')+7 else dt2 end ,'d') -
trunc(case when to_char(dt1,'d') > 5 then trunc(dt1,'d')+7 else dt1 end ,'d'))/7)
)*2
method1


, greatest((select count(*) from (select sysdate - 500 + level - 1 dd from dual connect by level< 1000) w where w.dd between dt1 and dt2 and to_char(w.dd,'d') < 6)-1,0) method2

,abs((trunc(dt1,'d')+7) - (trunc(dt2,'d')+7)) - trunc(abs((trunc(dt1,'d')+7) - (trunc(dt2,'d')+7))/7)*2 +
greatest(5 - to_char(dt1,'d'),0) -
greatest(5 - to_char(dt2,'d'),0)
method3


from
(
select to_date('24/08/2011','dd/mm/yyyy') dt1 , to_date('25/8/2011','dd/mm/yyyy') dt2 from dual
union select to_date('25/11/2011','dd/mm/yyyy') dt1 , to_date('28/11/2011','dd/mm/yyyy') dt2 from dual
union select to_date('13/12/2011','dd/mm/yyyy') dt1 , to_date('26/12/2011','dd/mm/yyyy') dt2 from dual
union select to_date('21/12/2011','dd/mm/yyyy') dt1 , to_date('11/01/2012','dd/mm/yyyy') dt2 from dual
union select to_date('22/12/2011','dd/mm/yyyy') dt1 , to_date('13/01/2012','dd/mm/yyyy') dt2 from dual
union select to_date('28/12/2011','dd/mm/yyyy') dt1 , to_date('25/01/2012','dd/mm/yyyy') dt2 from dual
) c
order by c.dt1