Wednesday, May 18, 2011

AutoIncrement sequences in Oracle/Postgres/MySql

Came across a situation, where I need to perform Auto insert on a Table's column in a sequential key order, and that too be compatible with Oracle, Postgres and MySql.

And here is how the comparision goes....,

Ofcourse we can use TableMax + 1, but the overhead is the performance...So decided to go with Pseudo columns..

Pseudocolumn behaves like a table column, but is not actually stored in the table. You can select from pseudocolumns, but you cannot insert, update, or delete their values.
Thus suggests the use of Sequences., that can generate unique sequential values. These values are often used for primary and unique keys. You can refer to sequence values in SQL statements with these pseudocolumns:

    * CURRVAL: Returns the current value of a sequence
    * NEXTVAL: Increments the sequence and returns the next value

A sequence can be accessed by many users concurrently with no waiting or locking.

Create Sequence:

CREATE SEQUENCE sequence_name
    MINVALUE value
    MAXVALUE value
    START WITH value
    INCREMENT BY value
    CACHE value;

Sequences are independant of any table and can be used to keep a value unique across a number of tables.

MySQL doesn't currently support sequences. There is no sequence number generator (CREATE SEQUENCE) in MySQL. However it does have an auto increment value which can be applied to a primary key of a table. This is done during the table creation.

mysql> create table seq_test (id int primary key auto_increment, name  varchar(21));

As the name suggests the value is assigned automatically, this is in contrast to Oracle where we have to call the nextval function of the sequence to return the value when we need it.

In Postgres you can use type SERIAL during table creation which is similar to AUTO_INCREMENT in MySQL.

pgsql=> CREATE TABLE seq_test (id    SERIAL PRIMARY KEY);

 Is just shorthand notation for:

pgsql=> CREATE SEQUENCE XXX START 1;
pgsql=> CREATE TABLE seq_test (id integer PRIMARY KEY DEFAULT nextval('XXX'));

Well the shorthand notation has a minor gotcha: you cannot drop the sequence that has been created automatically. You need to drop the column itself.
Thats the difference between a SERIAL PRIMARY KEY definition and the "verbose" mode.

Verbosely you can have even more control over the sequence. With SERIAL the default is something like

CREATE SEQUENCE XXX
    INCREMENT BY 1
    NO MAXVALUE
    NO MINVALUE
    CACHE 1;

By hand you can define e.g.

CREATE SEQUENCE XXX
    START n
    INCREMENT BY n
    MAXVALUE n
    MINVALUE n
    CACHE 1;

Insert in to Table using seq nextVal.

pgsql=> insert into seq_test(id, name)values(nextval('s1'), test1);
INSERT 0 1
pgsql=> select * from seq_test;
 id          name
------------------
  1          test1
(1 row)

We can also insert value of our choosing.

pgsql=> insert into seq_test(id, name)values('100', 'test2');
INSERT 0 1
pgsql=> select * from seq_test;
 id             name
-------------------------
   1            test1
 100            test2
(2 rows)

Thus, the sequence will start from its own higher point, not related to any of the table values.

pgsql=> insert into seq_test(id, name)values(nextval('s1'), 'test3');
INSERT 0 1
pgsql=> select * from seq_test;
 id             name
---------------------------
   1           test1
 100           test2
   2           test3
(3 rows)

However the same is different in MySQL If you use AUTO_INCREMENT.

mysql> insert into seq_test (name) values ('test1');

mysql> select * from seq_test;
+----+------+
| id | name |
+----+------+
|  1 | test1 |
+----+------+
1 row in set

However you can override the value of your choice.

mysql> insert into seq_test (id,name) values (100,'test2');
Query OK, 1 row affected

mysql> select * from seq_test;
+-----+------+
| id  | name |
+-----+------+
|   1 | test1 |
| 100 | test2 |
+-----+------+
2 rows in set

The sequence will then start from this new higher point.

mysql> insert into seq_test (name) values ('test3');
Query OK, 1 row affected

mysql> select * from seq_test;
+-----+-------+
| id  | name  |
+-----+-------+
|   1 | test1  |
| 100 | test2  |
| 101 | test3 |
+-----+-------+
3 rows in set

Also you can reset the sequence using an alter table command.

mysql> alter table seq_test auto_increment = 100;
Query OK, 1 row affected

This method could also be used to assign a higher number to start the sequence rather than starting with 1 by calling the alter table straight after the table creation.

So in summary auto_increment offers a great way of assigning a unique value automatically to a table. However what it doesn't allow when compared with an Oracle Sequence is different increment values, ability to use across a number of tables.
Oracle doesn't provide either AUTO_INCREMENT like MySQL or SERIAL like Postgres.,
If you look an equivalent to AUTO_INCREMENT in MySQL.,

-- Won't work:
Sql>CREATE TABLE   (seq_id NUMBER(1) DEFAULT xxx_seq.NEXTVAL);
ORA-00984: column not allowed here

Then how to create an autoincrement field in a table with a sequence ...

-- Will work:

Sql> create table seq_test(id number , name varchar2(21));
Statement processed.

First create a sequence
Sql> create sequence xxx ;
Statement processed.

Then create the trigger.
create trigger yyy before insert on seq_test
for each row
when (new.id is null)
begin
 select xxx.nextval into :new.id from dual;
end;
/
Also, the below works as like Postgres...

Sql>INSERT INTO seq_test(id, name)VALUES(xxx_seq.nextval, 'test1');

However, setting the default vlaue while Table creation to sequence.nextVal in Postgres will work unlike in Oracle.

pgsql=>create sequence xxx;

pgsql=>create table seq_test(id INT DEFAULT nextval('s1') NOT NULL, name varchar(21));

pgsql=> insert into seq_test(name)values('test4');

pgsql=> select * from seq_test;
id | name
----+-------
  3 | test4
(1 row)


Decided to use sequences while Insert.........


Tuesday, May 17, 2011

Execute PL/SQL on Oracle from Python Script

I had a chance to write a Python script that connects to Oracle and do some SQL..
I thought of bloging the basic steps., that I followed..

Install Oracle client (below packages), If the box doesn't have Oracle Instance., to run the script.

The local box requires the below packages to be installed., to run the script.

oracle-instantclient11.2-basic-11.xxxxx.xx.zip
cx_Oracle-5.xxx.tar.gz

Once the installation is done, write the script to connect to Oracle and do process.

#The Script uses the cx_Oracle to connect to the Oracle instance.. It can be of a local or remote instance. 

def grantSelectOnOracle():
    try:
        import cx_Oracle
        #Checking for the presence of database
        con = cx_Oracle.connect(u"%s/%s@%s:%s/%s" % ('DATABASE_USERNAME',
                                                     'DATABASE_PASSWORD',
                                                     'DATABASE_HOST',
                                                     'DATABASE_PORT',
                                                     'DATABASE_NAME'))


#Once the connection is established, the script tries to run a PL/SQL block, here #it grant select permission on the User passed.

        try:
            #Granting select privilege to read only user on all reports tables
            cur = con.cursor()
            grantCommand = """
begin
for i in (select table_name from user_tables)
loop
execute immediate 'grant select on '|| i.table_name||' to %s';
end loop;
end;""" % ( 'DATABASE_READ_ONLY_USER')
            cur.execute(u"%s" % grantCommand)
            cur.close()
            logger.info("Done.")
        except:
            logger.warn("Some Problem Occured in Granting Read Only Permission to User. Please set permissions Manually")
            pass
        con.close()
    except Exception, e:
        logger.error("Connection problem occured with Oracle")
        pass

Here it is., now enjoy with the Python script to execute PlSql's remotely on Oracle......!





Setup remote debugging in Eclipse

Debugging is always an interesting way to understand better how a system works internally.

And will happen with most of opensource projects.

So how can we easily debug ? The answer is : use remote debugging.

To be able to attach your Eclipse debugger to a running Java process you need to start that process with the following Java options…It can be any of your startup files for the application.. may be run.sh..etc

export JAVA_OPTS=

"-Xdebug -Xrunjdwp:transport=dt_socket,address=9000,server=y,suspend=n"

Once you have done this restart the server.

From Eclipse go to the Debug Configuration.






create a new Remote Java Application configuration for the process you want to connect to and set the port number to xxxx(eg - 9000), the same as that of the options.





Enter the hostname for the machine running the Java process. Thats it…you are on debug mode...

Java Program to run Pentaho kettle ETL Job

I got to use Kettle spoon as the ETL tool, and created Jobs and transformations.

and used the Pentaho provided API's to run the Transformations/Jobs from java.

An ETL Job file can be created and get stored in a disk.. the API requires the file path to get passed.


Provided the system has the required Pentaho libraries and the class path.,

The Jars can be downloaded from the Pentaho Site.

First Initialize the Kettle Environment from Java.,

/** static block that initializes the kettle environment */
    static {
        try {
            KettleEnvironment.init(false);
        } catch (KettleException e) {
            smLogger.error("Pentaho.Job.Run.Error.Initialization :" + e.getMessage());
        }
    }


And now pass the Kettle Job File Path to the API.

     /**
     * Executes the Job
     * @param jobPath jobFile
     * @throws ETLRunException
     */
    public void executeJob(String jobPath) throws ETLRunException {

# Initialize the Job Meta.

       try {
            JobMeta jobMeta = new JobMeta(jobPath, null, null);

            // Create the Job Instance
            Job job = new Job(null, jobMeta);
            job.getJobMeta().setInternalKettleVariables(job);

            job.setLogLevel("BASIC");

            job.setName(Thread.currentThread().getName());

            // Start the Job, as it is a Thread itself by Kettle.
            job.start();
            job.waitUntilFinished();

            if (job.getResult() != null && job.getResult().getNrErrors() != 0) {
                smLogger.error("Pentaho.Job.Run.Error.FinishedWithErrors");
                throw new ETLRunException("Pentaho.Job.Run.Error.FinishedWithErrors");
            }

            // Now the job task is finished, mark it as finished.
            job.setFinished(true);

            // Cleanup the parameters used by the job. Post that invoke GC.
            jobMeta.eraseParameters();
            job.eraseParameters();

        } catch (Exception e) {
            smLogger.error("Pentaho.Job.Run.Error.FinishedWithErrors :" + e.getMessage());
             throw new ETLRunException(e);
        }

    }
     

Log memory utilization before and after a Java process run




Sometimes running a Java process requires ..
to calculate the amount of memory it needs/takes to run.

And sometimes logging the same is useful when the job grows in future.

Java API provides a mechanism to log the memory..
In such a way.., We can calculate memory available, used memory, total memory.. etc by using the API provided methods..

Method to get the Total memory available...

Runtime.getRuntime().totalMemory()

Similarly, we can get the available memory by...

Runtime.getRuntime().freeMemory()

and by doing some calculation, we get the memory used by a Process...

Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()

Combining all these data, conclude a method that will log the Total memory allocated and the memory used by the Process.


Now time to log the memory usage before and after a job/process run.

# Write a LogUtil class which has the below method.

Log memory utilization before and after a job run

/**
     * returns the total memory and used memory.
     *
     * @return total and used memory
     */
    public static String getMemoryUsageLog() {

        return MEMORY_ALLOCATED + "[" + String.valueOf(Runtime.getRuntime().totalMemory()) + "] " + MEMORY_USED + "[" +
                       String.valueOf(Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) + "]";
    }


# Call the method from the Invoker class.

# Write a pre-execute method to call the Memory log util.

  /**
     * pre-execute method, initializes the pre-execute data
     *
     * @throws XXXRunException
     */
    protected void preExecute() throws XXXRunException {
        //memory usage before job run
        mLogger.debug("Memory Usage before Job Run : " + LogUtil.getMemoryUsageLog());       
    }

# Run the Process/Job

# Write a pre-execute method to call the Memory log util.

    /**
     * post-execute method, initializes the post-execute data
     *
     * @throws XXXRunException
     */
    protected void postExecute() throws XXXRunException {
        //memory usage after job run
        mLogger.debug("Memory Usage after Job Run : " + LogUtil.getMemoryUsageLog());       
    }

Now look for the memory logs in your loggers....! 

Connect Oracle from Shell and execute SQL.

A shell script that connects to Oracle and execute SQL..
The script will run on Oracle installed box, and do authentication against the username, password..

and verifies the Oracle_sid passed is valid. Once the authentication passes it executes the sql on Oracle.

The skeleton of the Script looks similar to the below...

Begin your main method.

main() {main() {

Make sure the User executing the script is Oracle user.

#1. Make sure only user oracle can run the script
if [ "$(whoami)" != "oracle" ]; then
   echo "This script must be run as oracle user" 1>&2
   exit 1
fi

#2. Read the Oracle System User Name

oracle_system_user=$PARAM(system)

#3. Read the Oracle System User Password

oracle_system_password=$PARAM(password)

#4. Prompt for ORACLE_SID
database_name=$PARAM(test_db)

#5. Check the given credentials are valid and that the SID exists.
checkAuth # Have this function created outside of main method.

#6. If, the Authentication suceeds, Export the Oracle SID
export ORACLE_SID=database_name

#7. Execute the Query you need to.
echo "Processing..............please do not exit.........."
executeProcess # Have this metod outside of main method.
echo "Execution Success........!"

# Close the main method.

}

main


# function to check if the given SID is valid.
checkAuth() {
    #Connect to oracle
    sqlplus -s -l $oracle_system_user/$oracle_system_password@$database_name << STD_IN
    exit;
STD_IN
    errorCode=$?   
        if [ ${errorCode} -ne 0 ];then
            echo "Unable to connect the DB with the supplied credentials."
            exit 1
        fi   
}


# function to execute process, DATABASE_USERNAME.
executeProcess() {
    #Connect to Oracle
    sqlplus -s -l $oracle_system_user/$oracle_system_password@$database_name << STD_IN   
    select sysdate from dual;   
    exit;
STD_IN
}


For reading the Params we can prompt the User to enter the valuse dynamically., by writing a readParam function.,
instead of hardcoding the values for username, password and sid values.. In that way the script can be uses as dynamic.

Similarly, the executeProcess method can be used to execute Plsql blocks.

The command that goes between "<< STD_IN" and "STD_IN" will be directly interpretted by Oracle, so we can have
only the valid sql commands inside... and can give any user defined name for "STD_IN"..it can be "XXXYYZZ".

that is only to say the interpreter the commands inside will get directly executed....!