Bigtable Summary
What is?
-> more expensive because you pay for the number of nodes that you are using
-> if 10 nodes, 100,000 queries per second with 6 millisecond latency
-> low latency
-> high throughput -> fast
-> structured data
-> NOT transactional
-> NOT SQL
-> global availability
-> durable, replicated, and you can get access to it
[图片上传中...(Screen Shot 2018-06-26 at 11.04.42 am.png-5ada72-1532174291870-0)]
Serverless?
No
Benefits
- Incredible scalability. Cloud Bigtable scales in direct proportion to the number of machines in your cluster. A self-managed HBase installation has a design bottleneck that limits the performance after a certain QPS is reached. Cloud Bigtable does not have this bottleneck, and so you can scale your cluster up to handle more queries.
- Simple administration. Cloud Bigtable handles upgrades and restarts transparently, and it automatically maintains high data durability. To replicate your data, simply add a second cluster to your instance, and replication starts automatically. No more managing masters or regions; just design your table schemas, and Cloud Bigtable will handle the rest for you.
- Cluster resizing without downtime. You can increase the size of a Cloud Bigtable cluster for a few hours to handle a large load, then reduce the cluster's size again—all without any downtime. After you change a cluster's size, it typically takes just a few minutes under load for Cloud Bigtable to balance performance across all of the nodes in your cluster.
What good for?
Storing time-series data in Cloud Bigtable is a natural fit
- Time-series data, such as CPU and memory usage over time for multiple servers.
- Marketing data, such as purchase histories and customer preferences.
- Financial data, such as transaction histories, stock prices, and currency exchange rates.
- Internet of Things data, such as usage reports from energy meters and home appliances.
- Graph data, such as information about how users are connected to one another.
How to use?
cbt
- a command-line interface for performing several different operations on Cloud Bigtable.
HBase shell
- HBase shell to connect to a Cloud Bigtable instance, perform basic administrative tasks, and read and write data in a table
Indexing
-> can only be indexed by row key. none of other columns can be indexed
Design
As a summary:
Get a balance between:
Distribute the reading load between tablets (you don’t want reading to be to only one tablet)
AND
Distribute the writing load between tablets (you don’t want writing to be to only one tablet)
AND
Design a row key to allow common queries to return consecutive rows
先看要query的东西在不在key里
然后看key有没有以下东西,避免hotspotting
Avoid using a row key that’s a domain or starts with a domain (can be part of domain though)
-> because certain domains are extremely active than others
-> the tablets corresponding to those customers are going to cause hot spotting
Avoid using User ID as row key if user IDs are sequentially assigned
-> it is OK if your user ID is randomly assigned e.g. by a hash code
-> because in many applications, newer users are going to be more active than users that were created 6-7 years ago
-> so if the User IDs are assigned in sequential order, the tablets that correspond to new users will tend to be more active -> hots potting
Avoid using a static identifier as a key, especially if you have a static identifier that’s going to keep getting used
-> if you have row key that’s mem usage or CPU usage or disk usage and you keep updating them over and over again, those nodes that do processing for these constantly updated data will get overworked
Avoid using dates as most writes will have the latest dates, thus same tablets -> hot spotting