Monday, June 29, 2015

Hadoop Archives (har) - Creating and Reading HAR



A quick post that explains the following with samples
  • Create a HAR file
  • List the Contents of a HAR file
  • Read the contents of a file that is within a HAR


Listed below is the input  directory structure in HDFS I’ll be using to create a har

hadoop fs -ls /bejoyks/test/har/source_files/*
Found 2 items
-rw-r--r--   3 hadoop supergroup         22 2015-06-29 20:25 /bejoyks/test/har/source_files/srcDir01/file1.tsv
-rw-r--r--   3 hadoop supergroup         22 2015-06-29 20:25 /bejoyks/test/har/source_files/srcDir01/file2.tsv
Found 2 items
-rw-r--r--   3 hadoop supergroup         22 2015-06-29 20:25 /bejoyks/test/har/source_files/srcDir02/file3.tsv
-rw-r--r--   3 hadoop supergroup         22 2015-06-29 20:25 /bejoyks/test/har/source_files/srcDir02/file4.tsv


CLI Command to create a HAR

Syntax
hadoop archive -archiveName tsv <archiveName.har> -p <ParentDirHDFS> -r <ReplicationFactor> <childDir01> <childDir02> <DestinationDirectoryHDFS>

Command Used
hadoop archive -archiveName tsv_daily.har -p /bejoyks/test/har/source_files -r 3 srcDir01 srcDir02 /bejoyks/test/har/destination


LISTING DIRS and FILES in HAR
Syntax
hadoop fs –ls  har://<AbsolutePathOfHarFile>

Command Used and Output
Command 01 :
hadoop fs -ls har:///bejoyks/test/har/destination/tsv_daily.har
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2015-06-29 20:39 har:///bejoyks/test/har/destination/tsv_daily.har/srcDir01
drwxr-xr-x   - hadoop supergroup          0 2015-06-29 20:39 har:///bejoyks/test/har/destination/tsv_daily.har/srcDir02

Command 02 :
hadoop fs -ls har:///home/hadoop/work/bejoyks/test/har/destination/tsv_daily.har/srcDir01
Found 2 items
-rw-r--r--   3 hadoop supergroup         22 2015-06-29 20:39 har:///bejoyks/test/har/destination/tsv_daily.har/srcDir01/file1.tsv
-rw-r--r--   3 hadoop supergroup         22 2015-06-29 20:39 har:///bejoyks/test/har/destination/tsv_daily.har/srcDir01/file2.tsv

READING a File within a HAR
hadoop fs -text har:///bejoyks/test/har/destination/tsv_daily.har/srcDir01/file2.tsv
file2    row1
file2    row2

** Common mistakes while reading a HAR file

Always use the URI while reading a HAR file
Since we are used lo listing the directories/files in HDFS without the URI , we might use the similar pattern here. But HAR files doen’t work well if it is not prefixed with URI . If listed without URI you’ll get the HAR metadata under the hood, something like below.

hadoop fs -ls /bejoyks/test/har/destination/tsv_daily.har
Found 3 items
-rw-r--r--   5 hadoop supergroup        277 2015-06-29 20:39 /bejoyks/test/har/destination/tsv_daily.har/_index
-rw-r--r--   5 hadoop supergroup         23 2015-06-29 20:39 /bejoyks/test/har/destination/tsv_daily.har/_masterindex
-rw-r--r--   3 hadoop supergroup         88 2015-06-29 20:39 /bejoyks/test/har/destination/tsv_daily.har/part-0

26 comments:

  1. The Hadoop tutorial you have explained is most useful for begineers who are taking Hadoop Administrator Online Training for Installing Haddop on their Own
    Thank you for sharing Such a good tutorials on Hadoop

    ReplyDelete
  2. May be it helps you to understand HDFS. so HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers. HDFS has demonstrated production scalability of up to 200 PB of storage and a single cluster of 4500 servers, supporting close to a billion files and blocks.
    http://www.computaholics.in/2015/12/hdfs.html

    ReplyDelete
  3. Hi, your blog about hadoop is very usefull. I want to use this technology for my site www.urbanejodi.com

    ReplyDelete
  4. Really a good piece of knowledge on Big Data and Hadoop. Thanks for such a good post. I would like to recommend one more resource NPN Training which helps in getting more knowledge on Hadoop.
    The best part of NPN Training is they provide complete Hands-on classes.

    For More Details visit
    http://npntraining.com/courses/big-data-and-hadoop.php

    ReplyDelete
  5. Very nice post here thanks for it I always like and search such topics and everything connected to them.Excellent and very cool idea and the subject at the top of magnificence
    and I am happy to comment on this topic through which we address the idea of positive reaction.

    Hadoop Training in Chennai

    ReplyDelete
  6. Thanks for providing this informative information…..
    You may also refer-
    http://www.s4techno.com/blog/category/hadoop/

    ReplyDelete
  7. wow amazing post.The key points you mentioned here related to maintenance of car is really awesome.Checking all fluid levels,changing oil and of course the regular service of the car which is necessary to maintain our vehicle.Thank you for the information.

    home spa services in mumbai

    ReplyDelete
  8. Hello,
    Thank you for the Blog.Parana Impact help you reach the right target customers
    to advertise your products and services.
    Hadoop Users Email List

    ReplyDelete
  9. I want you to thank for your time of this wonderful read!!! I definately enjoy every little bit of it and I have you bookmarked to check out new stuff of your blog a must read blog!!!!
    IOT Training
    IOT Online Training

    ReplyDelete
  10. I am expecting more interesting topics from you. And this was nice content and definitely it will be useful for many people.
    Back to original

    ReplyDelete
  11. thank you for offering such unique content.we are very happy to recieve articles from you.please update latest content in hadoop.one of the recommanded blog for newbies and hadoop professionals with great intend

    Hadoop training
    Hadoop training in hyderabad
    Hadoop training in usa

    ReplyDelete
  12. such a nice blog very helpful content for hadoop learners.who are taking on

    hadoop training it will helps students and professional.ome of the recommanded blog

    ReplyDelete
  13. Thank you for sharing such a nice and interesting blog with us. i have seen that all will say the same thing repeatedly. But in your blog, I had a chance to get some useful and unique information. I would like to suggest your blog in my dude circle.
    Selenium Training in Chennai

    ReplyDelete
  14. I really appreciate for your efforts to make things easy to understand. I was really many students struggling to understand certain concepts but you made it clear and help me bring back my confidence.

    Hadoop online training
    Hadoop online training in hyderabad
    Hadoop online training in usa
    Hadoop training in hyderabad

    ReplyDelete
  15. Thanks for this blog. provided great information. All the details are explained clearly with the great explanation. Thanks for this wonderful blog. Step by step processes execution are given clearly.Know the details about different thing.

    Seo Company in India

    ReplyDelete
  16. Wow.. Thanks much for sharing.. My friend also recommended you so that i can have a helping hand to make my blog as effective as possible.
    Study in USA Consultants in Chennai | Overseas Education Consultants in Chennai | Australia education Consultants in Chennai

    ReplyDelete
  17. All the details are explained clearly with the great explanation. Thanks for this wonderful blog. Step by step processes execution are given clearly.Know the details about different thing.
    Selenium Training in Chennai

    ReplyDelete
  18. Finding the time and actual effort to create a superb article like this is great thing. I’ll learn many new stuff right here! Good luck for the next post buddy..
    SEO Company in Chennai

    ReplyDelete
  19. Great site for these post and i am seeing the most of contents have useful for my Carrier.Thanks to such a useful information.Any information are commands like to share him.

    Aws Training in Chennai

    ReplyDelete
  20. This is an awesome post.Really very informative and creative contents. These concept is a good way to enhance the knowledge.I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.

    Digital Marketing Company in India

    ReplyDelete

  21. Thanks for this blog. provided great information. All the details are explained clearly with the great explanation. Thanks for this wonderful blog. Step by step processes execution are given clearly.Know the details about different thing.
    Seo Company in India

    ReplyDelete




  22. sds
    sdsdsd
    sdsdsd
    Big Data is known as to be extremely large datasets that are hard to deal with using operational databases. It is required for parallel processing on of data on hundreds of machines. Big Data has grown with a huge pace over the years and has set up a benchmark for the leading names n the industries so far.Hadoop is a scalable, fault-tolerant, grid operating system used for storage of data and processing. Its main components are as follows:* Commodity hardware
    * HDFS
    * MapReduce
    * Hive, Pig
    * Open source, Apache license, etc

    Nowadays companies are switching to big data technology , the demand for hadoop professionals are very high. this is the right time to upgrade your skills to Big Data Hadoop , if you are looking for a bright future in the hadoop platform. So what are you waiting for ? master professional knowledge in Big Data Hadoop and boost your career. here i would like to share an article - a complete guide on Hadoop course, the demand and scope of hadoop. check this out- https://goo.gl/QkCmL7

    ReplyDelete
  23. Webtrackker technology is the best IT training institute in NCR. Webtrackker provide training on all latest technology such as hadoop training. Webtrackker is not only training institute but also it also provide best IT solution to his client. Webtrackker provide training by experienced and working in the industry on same technology.Webtrackker Technology C-67 Sector-63 Noida 8802820025

    Hadoop Training institute in indirapuram


    Hadoop Training institute in Noida


    Hadoop Training institute in Ghaziabad


    Hadoop Training institute in Vaishali


    Hadoop Training institute in Vasundhara


    Hadoop Training institute in Delhi South Ex

    ReplyDelete
  24. thank you for sharing this informative blog.. this blog really helpful for everyone.. explanation are clear so easy to understand... I got more useful information from this blog

    hadoop training institute in velachery | big data training institute in velachery | hadoop training in chennai velachery | big data training in chennai velachery

    ReplyDelete
  25. After reading this blog i very strong in this topics and this blog really helpful to all... explanation are very clear so very easy to understand... thanks a lot for sharing this blog

    hadoop training institute in velachery | big data training institute in velachery | hadoop training in chennai velachery | big data training in chennai velachery

    ReplyDelete