`
CSAPP
  • 浏览: 9909 次
  • 性别: Icon_minigender_1
  • 来自: 杭州
最近访客 更多访客>>
文章分类
社区版块
存档分类
最新评论

The Apache HBase Book学习篇(二)

阅读更多


1.3 Not-so-quick Start Guide
1.3.1 Requirements
  HBase有如下一些要求。请务必认真的阅读以确保所有的要求都被满足了。如果有配置有问题将可能导致出现一个诡异的错误或者数据的丢失。
1.3.1.1 Java
  就像Hadoop一样,HBase需要java 6以上的运行环境。
1.3.1.2 hadoop
   目前版本的Hbase只能运行在Hadoop 0.20.x之上,截止2010.10月还不能运行在hadoop 0.21.x或者0.22.x之上。如果Hbase没有运行在具有持久化同步(durable sync)的HDFS上,它将会丢失数据。目前只有
branch-0.20-append   这个分支具有这一特性。现在非官方发行版已经从这个分支中构建出来了,所以你可以从这个分支中构建自己的Hadoop,或者使用Cloudera的 CDH3 , CDH已经有增加了durable sync特性的0.20-append patches(目前CDH3还是beta版,CDH3b2或者CDH3b3也将可以满足需求),查看 CHANGES.txt 在branch-0.20-append可以看到这个patches已经被包括了。
   因为HBase依赖于Hadoop,所以在lib文件夹中包括了Hadoop的实例包。这个包是从Hadoop的Apache branch-0.20-append 分支中构建的。如果你希望运行Hbase在Hadoop集群上,并且这个集群上的Hadoop版本不是从ranch-0.20-append中构建的,那么你必须用Hbase的lib中的hadoop的jar包替换现在集群中Hadoop的jar包,以避免出现版本不匹配的问题。例如CDH的版本没有HDFS-724,而branch-0.20-append已经有HDFS-724,这个patch主要改变了RPC的版本,因为协议已经改变了。版本的不匹配将导致很多的问题。
   Hadoop 的安全
    Hbase只能运行在Hadoop 0.20.x上面,并且这个版本已经拥有了Hadoop的安全特性,例如Y!0.20S或者CDH3B3,只有你能够按照上面的建议将Hbase中的hadoop的jar包替换现有集群的jar包就可以获得这些安全特性。


1.3.1.3. ssh
  ssh必须已经安装,sshd必须运行以使用Hadoop的脚本来管理远程的Hadoop和Hbase的daemons,你必须能用ssh打通所有的节点,包括本地节点,使用无密码登陆模式,(google "ssh passwordless login")
1.3.1.4.DNS
  Hbase使用本地的主机名报告它的IP地址,正向和方向的DNS解析必须能够工作。

  如果你的机器有多个interfaces,HBase将使用主机经常使用的interface。
  如果没有效果,你可以通过设置hbase.regionserver.dns.interface来指定主要的interface,而这个的实现只有当你的集群配置一致,并且每个主机有相同的network interface配置。
  另外一种方式是设置hbase.regionserver.dns.nameserver来选择不同的nameserver而不是全系统默认的。
1.3.1.5 NTP(Network Time Protocol)
   集群中的时间应该是满足基本一致的,一点点的偏移是可以容忍的,但是很大的偏差将导致一些奇怪的行为,所以需要运行
NTP 在你的集群上或者类似的协议。 

    如果你已经遇到了一些查询数据的问题,或者诡异的机器操作,check system time!

1.3.1.6. ulimit(ulimit用于shell启动进程所占用的资源)
   Hbase是一个数据库,它会经常同时使用过很多的文件 ,而默认的在*nix的系统中ulimit -n的设置是1024(ulimit -n size:设置内核可以同时打开的文件描述符的最大值.单位:n)故这样的设置是不够的,任何超过这个的数据都将导致出现FAQ: Why do I see "java.io.IOException...(Too many open files)" in my logs? 这样的问题:
   这个问题如何解决方案:
 当前版本的Hbase是一个file handle(句柄)的巨大消耗者,运行Hbase加载的w/超过一定的区域,有可能超过系统对于用户进程打开文件handle的1024的默认限制,运行超过文件handle数类似于OOME,经常会导致出现一些诡异的现象,为了增加file handles的数据,可以通过编辑/etc/security/limits.conf在所有的节点上面,然后重启集群。

# Each line describes a limit for a user in the form:
#
# domain    type    item    value
#
hbase     -    nofile  32768

 hbase是在Hbase已经运行后才能使用的,为了便于测试可以使用ulimit -n

除了修改稿上述配置以外,可能还需要对系统进行配置主要修改稿/etc/sysctl.conf中的fs.file-max:

# vi /etc/sysctl.conf

fs.file-max = 32000

# vi /etc/security/limits.conf

youruser       soft    nofile   10000
youruser       hard    nofile   30000
 
hard
for enforcing hard resource limits. These limits are set by the superuser and 
enforced by the Kernel. The user cannot raise his requirement of system resources
above such values.

soft
for enforcing soft resource limits. These limits are ones that the user
can move up or down within the permitted range by any pre-exisiting hard 
limits. The values specified with this token can be thought of as default values,
for normal system usage.
 





  
 

分享到:
评论

相关推荐

    HBase-The Definitive Guide-Second Edition-Early Release.pdf

    If you’re looking for a scalable storage solution to accommodate a virtually endless amount of data, this updated edition shows you how Apache HBase can meet your needs. Modeled after Google’s ...

    《Hbase权威指南》原版

    If your organization is looking for a storage solution to accommodate a virtually endless amount of data, this book will show you how Apache HBase can fulfill your needs. As the open source ...

    hbase-0.98.9-src.tar

    The hbase 'book' at http://hbase.apache.org/book.html has a 'quick start' section and is where you should being your exploration of the hbase project. The latest HBase can be downloaded from an ...

    HBase.The.Definitive.Guide.2nd.Edition

    If you’re looking for a scalable storage solution to accommodate a virtually endless amount of data, this updated edition shows you how Apache HBase can meet your needs. Modeled after Google’s ...

    HBase.High.Performance.Cookbook.epub

    This book is also for big data enthusiasts and database developers who have worked with other NoSQL databases and now want to explore HBase as another futuristic scalable database solution in the big...

    HBase:权威指南

    If your organization is looking for a storage solution to accommodate a virtually endless amount of data, this book will show you how Apache HBase can fulfill your needs. As the open source ...

    在hadoop-3.1.2上安装hbase-2.2.1.pdf

    关于分布式安装,请浏览:http://hbase.apache.org/book/standalone_dist.html#distributed,关于HBase使用外置的ZooKeeper配置,请浏览:http://hbase.apache.org/book/zookeeper.html。所有在线的文档,均会出现在...

    Hbase中文文档

    HBase and the Apache Software Foundation H.1. ASF Development Process H.2. ASF Board Reporting Index 表列表 5.1. Table webtable 5.2. ColumnFamily anchor 5.3. ColumnFamily contents 8.1. Operation To ...

    Practical Hadoop Ecosystem(Apress,2016)

    This book is a practical guide on using the Apache Hadoop projects including MapReduce, HDFS, Apache Hive, Apache HBase, Apache Kafka, Apache Mahout and Apache Solr. From setting up the environment to...

    Mastering.Apache.Spark.178397146

    Explore the integration of Apache Spark with third party applications such as H20, Databricks and Titan Evaluate how Cassandra and Hbase can be used for storage An advanced guide with a combination of...

    Apache Hadoop 3 Quick Start Guide

    The book begins with an overview of big data and Apache Hadoop. Then, you will set up a pseudo Hadoop development environment and a multi-node enterprise Hadoop cluster. You will see how the parallel ...

    《HBase Not sleeping book.pdf

    HBase是Apache旗下一个高可靠性、高性能、面向列、可伸缩的分 布式存储系统。利用HBase技术可在廉价的PC服务器上搭建大规模的存 储化集群,使用HBase可以对数十亿级别的大数据进行实时性的高性能 读写,在满足高性能...

    Pro Apache Phoenix(Apress,2016)

    Pro Apache Phoenix covers the nuances of setting up a distributed HBase cluster with Phoenix libraries, running performance benchmarks, configuring parameters for production scenarios, and viewing the...

    Pro.Docker.148421829

    In this fast-paced book on the Docker open standards platform for developing, packaging and running portable distributed applications, Deepak Vorha discusses how to build, ship and run applications ...

    Hadoop The Definitive Guide PDF

    The rest of this book is organized as follows. Chapter 2 provides an introduction to MapReduce. Chapter 3 looks at Hadoop filesystems, and in particular HDFS, in depth. Chapter 4 covers the ...

    Hadoop: The Definitive Guide

    Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates...

    Pro Spark Streaming,The Zen of Real-time Analytics using Apache Spark

    Finally, these applications can use out-of-the- box integrations with other systems such as Kafka, Flume, HBase, and Cassandra. All of these features have turned Spark Streaming into the Swiss Army ...

    Hadoop Backup and Recovery Solutions(ydE).pdf

    A deep dive into the interesting world of Apache HBase will show you different ways of backing up data and will compare them. Going forward, you’ll learn the methods of defining recovery strategies ...

    Seven Databases in Seven Weeks - Luc Perkins

    covered are PostgreSQL, Apache HBase, MongoDB, Apache CouchDB, Neo4J, DynamoDB, and Redis. Each chapter is designed to be taken as a long weekend’s worth of work, split up into three days. Each day ...

    Hadoop- The Definitive Guide, 3rd Edition.pdf

    The rest of this book is organized as follows. Chapter 1 emphasizes the need for Hadoop and sketches the history of the project. Chapter 2 provides an introduction to MapReduce. Chapter 3 looks at ...

Global site tag (gtag.js) - Google Analytics