Programming Hive. Data Warehouse and Query Language for Hadoop Edward Capriolo, Dean Wampler, Jason Rutherglen

(ebook) (audiobook) (audiobook)

Promocja Przejdź

Programming Hive. Data Warehouse and Query Language for Hadoop Edward Capriolo, Dean Wampler, Jason Rutherglen - okladka książki

Programming Hive. Data Warehouse and Query Language for Hadoop Edward Capriolo, Dean Wampler, Jason Rutherglen - audiobook MP3

Programming Hive. Data Warehouse and Query Language for Hadoop Edward Capriolo, Dean Wampler, Jason Rutherglen - audiobook CD

Autorzy:: Edward Capriolo, Dean Wampler, Jason Rutherglen
Wydawnictwo:: O'Reilly Media (Z chęcią przeczytam książkę w języku polskim)
Ocena:: Bądź pierwszym, który oceni tę książkę
Stron:: 350
Dostępne formaty:: ePub

Mobi

Ebook

118,15 zł ~~139,00 zł~~ (-15%)

118,15 zł najniższa cena z 30 dni

Dodaj do koszyka Dostępny natychmiast po opłaceniu zakupu lub Kup na prezent Kup 1-kliknięciem

Przenieś na półkę

Do przechowalni

Kup w zestawie z dodatkowym rabatem

Programming Hive. Data Warehouse and Query Language for Hadoop Edward Capriolo, Dean Wampler, Jason Rutherglen

Programming Scala. 3rd Edition Dean Wampler

Functional Programming for Java Developers. Tools for Better Concurrency, Abstraction, and Agility Dean Wampler

Cena zestawu: 354.07 zł

Oszczędzasz: 93,83 zł (21%)

Dodaj do koszyka

Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoop’s data warehouse infrastructure. You’ll quickly learn how to use Hive’s SQL dialect—HiveQL—to summarize, query, and analyze large datasets stored in Hadoop’s distributed filesystem.

This example-driven guide shows you how to set up and configure Hive in your environment, provides a detailed overview of Hadoop and MapReduce, and demonstrates how Hive works within the Hadoop ecosystem. You’ll also find real-world case studies that describe how companies have used Hive to solve unique problems involving petabytes of data.

Use Hive to create, alter, and drop databases, tables, views, functions, and indexes
Customize data formats and storage options, from files to external databases
Load and extract data from tables—and use queries, grouping, filtering, joining, and other conventional query methods
Gain best practices for creating user defined functions (UDFs)
Learn Hive patterns you should use and anti-patterns you should avoid
Integrate Hive with other data processing programs
Use storage handlers for NoSQL databases and other datastores
Learn the pros and cons of running Hive on Amazon’s Elastic MapReduce

Wybrane bestsellery

Promocja Promocja 2za1

Software development today is embracing functional programming (FP), whether it's for writing concurrent programs or for managing Big Data. Where does that leave Java developers? This concise book offers a pragmatic, approachable introduction to FP for Java developers or anyone who uses an object-oriented language.Dean Wampler, Java expert and auth
- ebook
Functional Programming for Java Developers. Tools for Better Concurrency, Abstraction, and Agility

Dean Wampler

(59,42 zł najniższa cena z 30 dni)

59.42 zł ~~69.90 zł (-15%)~~
Promocja Promocja 2za1

Get up to speed on Scala--the JVM, JavaScript, and natively compiled language that offers all the benefits of functional programming, a modern object model, and an advanced type system. Packed with code examples, this comprehensive book shows you how to be productive with the language and ecosystem right away. You'll learn why Scala is ideal for bu
- ebook
Programming Scala. 3rd Edition

Dean Wampler

(203,15 zł najniższa cena z 30 dni)

203.15 zł ~~239.00 zł (-15%)~~
Promocja Promocja 2za1

Książka zawiera szczegółowe omówienie Javy 21, programowania korporacyjnego, sieciowego i bazodanowego, a także zagadnień związanych z internacjonalizacją i metodami natywnymi. Dużo miejsca poświęcono obsłudze strumieni, pracy z językiem XML, API dat i czasu, API skryptowemu czy kompilacji. Opisano też sposoby korzystania z biblioteki Swing, tworzenia graficznych interfejsów użytkownika po stronie klienta i generowania obrazów po stronie serwera. Przykłady kodu zostały starannie przetestowane, prezentują nowoczesny styl programowania w Javie i opierają się na najlepszych praktykach.
- książka
- ebook
Java. Techniki zaawansowane. Wydanie XIII

Cay S. Horstmann

(101,40 zł najniższa cena z 30 dni)

101.40 zł ~~169.00 zł (-40%)~~
Promocja Promocja 2za1

Ta książka jest kolejnym, zaktualizowanym i uzupełnionym wydaniem klasycznego podręcznika dla doświadczonych programistów, którzy wymagają dokładnego opisu języka Java i jego platformy. Zawiera szczegółowe omówienie wszystkich jego składników, w tym najnowszych ulepszeń dodanych w wersji 21. W poszczególnych rozdziałach znajdują się przykłady kodu, które ilustrują najnowsze składniki obszernej biblioteki Javy ― przystępne i praktyczne, stanowią świetny punkt wyjścia do pisania własnego kodu. W pierwszym tomie podręcznika znalazły się podstawowe zagadnienia związane z programowaniem w Javie, od programowania obiektowego, przez techniki refleksji i obiektów pośrednich, po wyrażenia lambda, adnotacje i system modułów platformy Java.
- książka
- ebook
Java. Podstawy. Wydanie XIII

Cay S. Horstmann

(89,40 zł najniższa cena z 30 dni)

89.40 zł ~~149.00 zł (-40%)~~
Promocja Promocja 2za1

Język Java jest konsekwentnie udoskonalany i unowocześniany dzięki zaangażowaniu wielu ludzi. Nowoczesny język Java staje się coraz bardziej wieloparadygmatowy, co oznacza, że stosowanie najlepszych praktyk w coraz większym stopniu determinuje jakość kodu. Obecnie napisanie kodu, który prawidłowo działa i może być łatwo zrozumiany przez innych programistów, nie wystarczy — należy zbudować program w taki sposób, aby można było go łatwo modyfikować. Jako że Java stała się obszerną i złożoną platformą, konieczne stało się uaktualnienie najlepszych praktyk.
- książka
- ebook
Java. Efektywne programowanie. Wydanie III

Joshua Bloch

(59,40 zł najniższa cena z 30 dni)

59.40 zł ~~99.00 zł (-40%)~~
Promocja Promocja 2za1

Description Designing scalable software is difficult; it requires significant effort and knowledge to come up with a design solution that is easy to implement, feasible, adheres to the programming principles, and is scalable in nature. Most functional applications today rely on one or more design patterns to accomplish the complex tasks they were b
- ebook
Software Design Patterns for Java Developers - 2nd Edition

Lalit Mehra

(125,10 zł najniższa cena z 30 dni)

125.10 zł ~~139.00 zł (-10%)~~
Promocja 2za1

Książka poświęcona jest problematyce wytwarzania oprogramowania z wykorzystaniem podejścia obiektowego i notacji UML. Szczególny nacisk położono na przełożenie teoretycznych pojęć obiektowości na praktyczne odpowiedniki implementacyjne. Na konkretnym, biznesowym przykładzie (biblioteka) opisano poszczególne fazy wytwarzania oprogramowania. Wyjaśnie
- ebook
Modelowanie i implementacja systemów informatycznych 2.0

Mariusz Trzaska

39.99 zł
Promocja Promocja 2za1

This book will help you learn about open table formats and pick the right table format for your needs, blending theoretical understanding with practical examples to enable you to build, maintain, and optimize lakehouses in production.
- ebook
Engineering Lakehouses with Open Table Formats. Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake

Dipankar Mazumdar, Vinoth Govindarajan, Chao Sun

(116,10 zł najniższa cena z 30 dni)

116.10 zł ~~129.00 zł (-10%)~~
Promocja Promocja 2za1

As a Java enterprise developer or architect, you know that embracing AI isn't just optional—it's critical to keeping your competitive edge. The question is, how can you skillfully incorporate these groundbreaking AI technologies into your applications without getting mired in complexity? Enter this clear-cut, no-nonsense guide to integrating genera
- ebook
Applied AI for Enterprise Java Development. Leveraging Generative AI, LLMs, and Machine Learning in the Java Enterprise

Alex Soto Bueno, Markus Eisele, Natale Vinto

(186,15 zł najniższa cena z 30 dni)

186.15 zł ~~219.00 zł (-15%)~~
Promocja Promocja 2za1

Description Java generics and the Collections Framework are at the heart of writing efficient, type-safe, and scalable Java applications. Generics are essential for type safety and maximizing code reusability, making your applications less prone to runtime errors. If you want to write code that is not only clean but also performs at its best, maste
- ebook
Java Generics and Collections

Meenu Jaiswal, Sunil Gupta

(89,91 zł najniższa cena z 30 dni)

89.91 zł ~~99.90 zł (-10%)~~
Promocja Promocja 2za1

Welcome to the future of Java. With this book, you'll explore the transformative world of Java 21's key feature: virtual threads. Remember struggling with the cost of thread creation, encountering limitations on scalability, and facing difficulties in achieving high throughput? Those days are over. This practical guide takes you from Java 1.0 to th
- ebook
Modern Concurrency in Java. Virtual Threads, Structured Concurrency, and Beyond

A N M Bazlur Rahman

(186,15 zł najniższa cena z 30 dni)

186.15 zł ~~219.00 zł (-15%)~~
Promocja Promocja 2za1

Description Data Structures and Algorithms is an important subject in any university curriculum for computer science stream. It provides a great tool in the hands of software engineers and plays significant role in software design and development. It is also becoming a must have skill for many competitions and job interviews in software industry. T
- ebook
Comprehensive Data Structures and Algorithms in Java

S. K. Srivastava, Deepali Srivastava

(89,91 zł najniższa cena z 30 dni)

89.91 zł ~~99.90 zł (-10%)~~

O autorze książki

Edward Capriolo, who also authored the previous book, Cassandra High Performance Cookbook, is currently system administrator at Media6degrees where he helps design and maintain distributed data storage systems for the Internet advertising industry. Edward is a member of the Apache Software Foundation and a committer for the Hadoop-Hive project. He has experience as a developer as well as a Linux and network administrator and enjoys the rich world of open source software.

Ebooka "Programming Hive. Data Warehouse and Query Language for Hadoop" przeczytasz na:

czytnikach Inkbook, Kindle, Pocketbook, Onyx Booxs i innych
systemach Windows, MacOS i innych

systemach Windows, Android, iOS, HarmonyOS
na dowolnych urządzeniach i aplikacjach obsługujących formaty: PDF, EPub, Mobi

Masz pytania? Zajrzyj do zakładki Pomoc »

Oceny i opinie klientów: Programming Hive. Data Warehouse and Query Language for Hadoop Edward Capriolo, Dean Wampler, Jason Rutherglen

(0)

Szczegóły książki

ISBN Ebooka:: 978-14-493-2697-5, 9781449326975
Data wydania ebooka :: 2012-09-19 Data wydania ebooka często jest dniem wprowadzenia tytułu do sprzedaży i może nie być równoznaczna z datą wydania książki papierowej. Dodatkowe informacje możesz znaleźć w darmowym fragmencie. Jeśli masz wątpliwości skontaktuj się z nami sklep@helion.pl.
Język publikacji:: angielski
Rozmiar pliku ePub:: 2.7MB
Rozmiar pliku Mobi:: 7.5MB

Zgłoś erratę
Kategorie:
Dane » Analiza i wizualizacja danych » Business Intelligence
Dane » Big Data i bazy danych » SQL
Dane » Big Data i bazy danych » Inżynieria danych i architektura

Dostępność produktu

Produkt nie został jeszcze oceniony pod kątem ułatwień dostępu lub nie podano żadnych informacji o ułatwieniach dostępu lub są one niewystarczające. Prawdopodobnie Wydawca/Dostawca jeszcze nie umożliwił dokonania walidacji produktu lub nie przekazał odpowiednich informacji na temat jego dostępności.

Spis treści książki

Programming Hive
Preface
- Conventions Used in This Book
- Using Code Examples
- Safari Books Online
- How to Contact Us
- What Brought Us to Hive?
  - Edward Capriolo
  - Dean Wampler
  - Jason Rutherglen
- Acknowledgments
1. Introduction
- An Overview of Hadoop and MapReduce
  - MapReduce
- Hive in the Hadoop Ecosystem
  - Pig
  - HBase
  - Cascading, Crunch, and Others
- Java Versus Hive: The Word Count Algorithm
- Whats Next
2. Getting Started
- Installing a Preconfigured Virtual Machine
- Detailed Installation
  - Installing Java
    - Linux-specific Java steps
    - Mac OS Xspecific Java steps
  - Installing Hadoop
  - Local Mode, Pseudodistributed Mode, and Distributed Mode
  - Testing Hadoop
  - Installing Hive
- What Is Inside Hive?
- Starting Hive
- Configuring Your Hadoop Environment
  - Local Mode Configuration
  - Distributed and Pseudodistributed Mode Configuration
  - Metastore Using JDBC
- The Hive Command
  - Command Options
- The Command-Line Interface
  - CLI Options
  - Variables and Properties
  - Hive One Shot Commands
  - Executing Hive Queries from Files
  - The .hiverc File
  - More on Using the Hive CLI
    - Autocomplete
  - Command History
  - Shell Execution
  - Hadoop dfs Commands from Inside Hive
  - Comments in Hive Scripts
  - Query Column Headers
3. Data Types and File Formats
- Primitive Data Types
- Collection Data Types
- Text File Encoding of Data Values
- Schema on Read
4. HiveQL: Data Definition
- Databases in Hive
- Alter Database
- Creating Tables
  - Managed Tables
  - External Tables
- Partitioned, Managed Tables
  - External Partitioned Tables
  - Customizing Table Storage Formats
- Dropping Tables
- Alter Table
  - Renaming a Table
  - Adding, Modifying, and Dropping a Table Partition
  - Changing Columns
  - Adding Columns
  - Deleting or Replacing Columns
  - Alter Table Properties
  - Alter Storage Properties
  - Miscellaneous Alter Table Statements
5. HiveQL: Data Manipulation
- Loading Data into Managed Tables
- Inserting Data into Tables from Queries
  - Dynamic Partition Inserts
- Creating Tables and Loading Them in One Query
- Exporting Data
6. HiveQL: Queries
- SELECT FROM Clauses
  - Specify Columns with Regular Expressions
  - Computing with Column Values
  - Arithmetic Operators
  - Using Functions
    - Mathematical functions
    - Aggregate functions
    - Table generating functions
    - Other built-in functions
  - LIMIT Clause
  - Column Aliases
  - Nested SELECT Statements
  - CASE WHEN THEN Statements
  - When Hive Can Avoid MapReduce
- WHERE Clauses
  - Predicate Operators
  - Gotchas with Floating-Point Comparisons
  - LIKE and RLIKE
- GROUP BY Clauses
  - HAVING Clauses
- JOIN Statements
  - Inner JOIN
  - Join Optimizations
  - LEFT OUTER JOIN
  - OUTER JOIN Gotcha
  - RIGHT OUTER JOIN
  - FULL OUTER JOIN
  - LEFT SEMI-JOIN
  - Cartesian Product JOINs
  - Map-side Joins
- ORDER BY and SORT BY
- DISTRIBUTE BY with SORT BY
- CLUSTER BY
- Casting
  - Casting BINARY Values
- Queries that Sample Data
  - Block Sampling
  - Input Pruning for Bucket Tables
- UNION ALL
7. HiveQL: Views
- Views to Reduce Query Complexity
- Views that Restrict Data Based on Conditions
- Views and Map Type for Dynamic Tables
- View Odds and Ends
8. HiveQL: Indexes
- Creating an Index
  - Bitmap Indexes
- Rebuilding the Index
- Showing an Index
- Dropping an Index
- Implementing a Custom Index Handler
9. Schema Design
- Table-by-Day
- Over Partitioning
- Unique Keys and Normalization
- Making Multiple Passes over the Same Data
- The Case for Partitioning Every Table
- Bucketing Table Data Storage
- Adding Columns to a Table
- Using Columnar Tables
  - Repeated Data
  - Many Columns
- (Almost) Always Use Compression!
10. Tuning
- Using EXPLAIN
- EXPLAIN EXTENDED
- Limit Tuning
- Optimized Joins
- Local Mode
- Parallel Execution
- Strict Mode
- Tuning the Number of Mappers and Reducers
- JVM Reuse
- Indexes
- Dynamic Partition Tuning
- Speculative Execution
- Single MapReduce MultiGROUP BY
- Virtual Columns
11. Other File Formats and Compression
- Determining Installed Codecs
- Choosing a Compression Codec
- Enabling Intermediate Compression
- Final Output Compression
- Sequence Files
- Compression in Action
- Archive Partition
- Compression: Wrapping Up
12. Developing
- Changing Log4J Properties
- Connecting a Java Debugger to Hive
- Building Hive from Source
  - Running Hive Test Cases
  - Execution Hooks
- Setting Up Hive and Eclipse
- Hive in a Maven Project
- Unit Testing in Hive with hive_test
- The New Plugin Developer Kit
13. Functions
- Discovering and Describing Functions
- Calling Functions
- Standard Functions
- Aggregate Functions
- Table Generating Functions
- A UDF for Finding a Zodiac Sign from a Day
- UDF Versus GenericUDF
- Permanent Functions
- User-Defined Aggregate Functions
  - Creating a COLLECT UDAF to Emulate GROUP_CONCAT
- User-Defined Table Generating Functions
  - UDTFs that Produce Multiple Rows
  - UDTFs that Produce a Single Row with Multiple Columns
  - UDTFs that Simulate Complex Types
- Accessing the Distributed Cache from a UDF
- Annotations for Use with Functions
  - Deterministic
  - Stateful
  - DistinctLike
- Macros
14. Streaming
- Identity Transformation
- Changing Types
- Projecting Transformation
- Manipulative Transformations
- Using the Distributed Cache
- Producing Multiple Rows from a Single Row
- Calculating Aggregates with Streaming
- CLUSTER BY, DISTRIBUTE BY, SORT BY
- GenericMR Tools for Streaming to Java
- Calculating Cogroups
15. Customizing Hive File and Record Formats
- File Versus Record Formats
- Demystifying CREATE TABLE Statements
- File Formats
  - SequenceFile
  - RCFile
  - Example of a Custom Input Format: DualInputFormat
- Record Formats: SerDes
- CSV and TSV SerDes
- ObjectInspector
- Think Big Hive Reflection ObjectInspector
- XML UDF
- XPath-Related Functions
- JSON SerDe
- Avro Hive SerDe
  - Defining Avro Schema Using Table Properties
  - Defining a Schema from a URI
  - Evolving Schema
- Binary Output
16. Hive Thrift Service
- Starting the Thrift Server
- Setting Up Groovy to Connect to HiveService
- Connecting to HiveServer
- Getting Cluster Status
- Result Set Schema
- Fetching Results
- Retrieving Query Plan
- Metastore Methods
  - Example Table Checker
    - Finding tables not marked as external
- Administrating HiveServer
  - Productionizing HiveService
  - Cleanup
- Hive ThriftMetastore
  - ThriftMetastore Configuration
  - Client Configuration
17. Storage Handlers and NoSQL
- Storage Handler Background
- HiveStorageHandler
- HBase
- Cassandra
  - Static Column Mapping
  - Transposed Column Mapping for Dynamic Columns
  - Cassandra SerDe Properties
- DynamoDB
18. Security
- Integration with Hadoop Security
- Authentication with Hive
- Authorization in Hive
  - Users, Groups, and Roles
  - Privileges to Grant and Revoke
  - Partition-Level Privileges
  - Automatic Grants
19. Locking
- Locking Support in Hive with Zookeeper
- Explicit, Exclusive Locks
20. Hive Integration with Oozie
- Oozie Actions
  - Hive Thrift Service Action
- A Two-Query Workflow
- Oozie Web Console
- Variables in Workflows
- Capturing Output
- Capturing Output to Variables
21. Hive and Amazon Web Services (AWS)
- Why Elastic MapReduce?
- Instances
- Before You Start
- Managing Your EMR Hive Cluster
- Thrift Server on EMR Hive
- Instance Groups on EMR
- Configuring Your EMR Cluster
  - Deploying hive-site.xml
  - Deploying a .hiverc Script
    - Deploying .hiverc using a config step
    - Deploying a .hiverc using a bootstrap action
  - Setting Up a Memory-Intensive Configuration
- Persistence and the Metastore on EMR
- HDFS and S3 on EMR Cluster
- Putting Resources, Configs, and Bootstrap Scripts on S3
- Logs on S3
- Spot Instances
- Security Groups
- EMR Versus EC2 and Apache Hive
- Wrapping Up
22. HCatalog
- Introduction
- MapReduce
  - Reading Data
  - Writing Data
- Command Line
- Security Model
- Architecture
23. Case Studies
- m6d.com (Media6Degrees)
  - Data Science at M6D Using Hive and R
  - M6D UDF Pseudorank
  - M6D Managing Hive Data Across Multiple MapReduce Clusters
    - Cross deployment queries with Hive
    - Replicating Hive data between deployments
- Outbrain
  - In-Site Referrer Identification
    - Cleaning up the URLs
    - Determining referrer type
    - Multiple URLs
  - Counting Uniques
    - Why this is a problem
    - Load a temp table
    - Querying the temp table
  - Sessionization
    - Setting it up
    - Finding origin pageviews
    - Bucketing PVs to origins
    - Aggregating on origins
    - Aggregating on origin type
    - Measure engagement
- NASAs Jet Propulsion Laboratory
  - The Regional Climate Model Evaluation System
  - Our Experience: Why Hive?
  - Some Challenges and How We Overcame Them
    - Conclusion
- Photobucket
  - Big Data at Photobucket
  - What Hardware Do We Use for Hive?
  - Whats in Hive?
  - Who Does It Support?
- SimpleReach
- Experiences and Needs from the Customer Trenches
  - A Karmasphere Perspective
  - Introduction
  - Use Case Examples from the Customer Trenches
    - Customer trenches #1: Optimal data formatting for Hive
    - Customer trenches #2: Partitions and performance
    - Customer trenches #3: Text analytics with Regex, Lateral View Explode, Ngram, and other UDFs
      - Apache Hive in production: Incremental needs and capabilities
      - About Karmasphere
Glossary
A. References
Index
About the Authors
Colophon
Copyright