Tuesday, August 21, 2012

How to migrate a hive table from one hive instance to another or between hive databases


Hive has the EXPORT IMPORT feature since hive 0.8. With this feature you can export the metadata as well as the data for a corresponding table to a file in hdfs using the EXPORT command. The data is stored in json format. Data once exported this way could be imported back to another database or hive instance using the IMPORT command.

The syntax looks something like this:
EXPORT TABLE table_or_partition TO hdfs_path;
IMPORT [[EXTERNAL] TABLE table_or_partition] FROM hdfs_path [LOCATION [table_location]];

Some sample statements would look like:
EXPORT TABLE <table name> TO 'location in hdfs';

Use test_db;
IMPORT FROM 'location in hdfs';

Export Import can be appled on a partition basis as well:
EXPORT TABLE <table name> PARTITION (loc="USA") to 'location in hdfs';

The below import commands imports to an external table instead of a managed one
IMPORT EXTERNAL TABLE FROM 'location in hdfs' LOCATION ‘/location/of/external/table’;

6 comments:

  1. This is very nice article. This is very use ful for Hadoop Learners.

    ReplyDelete
  2. hadoop is new technology ,this article is very good for the beginner,i was learn hadoop from Easylearning Guru it was help me the most
    and in this there is life time support system which will help me to clear my doubt any time any where

    ReplyDelete
  3. Thanks for InformationHadoop Course will provide the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action. This course will further examine related technologies such as Hive, Pig, and Apache Accumulo. HADOOP Online Training

    ReplyDelete
  4. Hadoop is really a good booming technology in coming days. And good path for the people who are looking for the Good Hikes and long career. We also provide Hadoop online training

    ReplyDelete
  5. Thanks for the post. I have a scenario where i have to dumb entire database to hive instance on different cluster (on different network). I have 100's of table in database. It's really hard to do for each. I guess i do this using python script in worst case. But i'm just curious is there any simple command like "export database dbname" ?

    ReplyDelete