Thursday, May 18, 2017

C2090-101 IBM Big Data Engineer

Test information:
Number of questions: 53
Time allowed in minutes: 75
Required passing score: 65%
Languages: English

This test consists of 5 sections containing a total of 53 multiple-choice questions. The percentages after each section title reflect the approximate distribution of the total question set across the sections.

Data Loading (34%)
Load unstructured data into InfoSphere BigInsights
Import streaming data into Hadoop using InfoSphere Streams
Create a BigSheets workbook
Import data into Hadoop and create Big SQL table definitions
Import data to HBase
Import data to Hive
Use Data Click to load from relational sources into InfoSphere BigInsights with a self-service process
Extract data from a relational source using Sqoop
Load log data into Hadoop using Flume
Insert data via IBM General Parallel File System (GPFS) Posix file system API
Load data with Hadoop command line utility

Data Security (8%)
Keep data secure within PCI standards
Uses masking (e.g. Optim, Big SQL), and redaction to protect sensitive data

Architecture and Integration (17%)
Implement MapReduce
Evaluate use cases for selecting Hive, Big SQL, or HBase
Create and/or query a Solr index
Evaluate use cases for selecting potential file formats (e.g. JSON, CSV, Parquet, Sequence, etc�)
Utilize Apache Hue for search visualization

Performance and Scalability (15%)
Use Resilient Distributed Dataset (RDD) to improve MapReduce performance
Choose file formats to optimize performance of Big SQL, JAQL, etc.
Make specific performance tuning decisions for Hive and HBase
Analyze performance considerations when using Apache Spark

Data Preparation, Transformation, and Export (26%)
Use Jaql query methods to transform data in InfoSphere BigInsights
Capture and prep social data for analytics
Integrating SPSS model scoring in InfoSphere Streams
Implement entity resolution within a Big Data platform (e.g. Big Match)
Utilize Pig for data transformation and data manipulation
Use Big SQL to transform data in InfoSphere BigInsights
Export processing results out of Hadoop (e.g. DataClick, DataStage, etc.)
Utilize consistent regions in InfoSphere Streams to ensure at least once processing

IBM Certified Data Engineer - Big Data

Job Role Description / Target Audience

This certification is intended for IBM Big Data Engineers. The Big Data Engineer works directly with the Data Architect and hands-on Developers to convert the architect's Big Data vision and blueprint into a Big Data reality. The Data Engineer possesses a deep level of technical knowledge and experience across a wide array of products and technologies. A Data Engineer understands how to apply technologies to solve big data problems, and has the ability to build large-scale data processing systems for the enterprise. Data engineers develop, maintain, test and evaluate big data solutions within organizations. They provide input on the needed hardware and software to the architects.

Big Data Engineers focus on collecting, parsing, managing and analyzing large data sets, in order to provide the right data sets and visual tools for analysis to the data scientists. They understand the complexity of data and can handle different data variety (structured, semi-structured, unstructured), volume, velocity (including stream processing), and veracity. They also address the information governance and security challenges associated with the data. They have a good background in software engineering and extensive programming and scripting experience.

To attain the IBM Certified Big Data Engineer, candidates must pass one test. To gain additional knowledge and skills, and prepare for the test based on the job roles and test objectives, click on the link to the test below and refer to the Test Preparation tab.

Recommended Prerequisite Skills
Understand the data layer and particular areas of potential challenge/risk in the data layer
Ability to translate functional requirements into technical specifications.
Ability to take overall solution/logical architecture and provide physical architecture.
Understand Cluster Management
Understand Network Requirements
Understand Important interfaces
Understand Data Modeling
Ability to identify/support non-functional requirements for the solution
Understand Latency
Understand Scalability
Understand High Availability
Understand Data Replication and Synchronization
Understand Disaster Recovery
Understand Overall performance (Query Performance, Workload Management, Database Tuning)
Propose recommended and/or best practices regarding the movement, manipulation, and storage of data in a big data solution (including, but not limited to:
Understand Data ingestion technical options
Understand Data storage options and ramifications (for example , understand the additional requirements and challenges introduced by data in the cloud)
Understand Data querying techniques & availability to support analytics
Understand Data lineage and data governance
Understand Data variety (social, machine data) and data volume
Understand/Implement and provide guidance around data security to support implementation, including but not limited to:
Understand LDAP Security
Understand User Roles/Security
Understand Data Monitoring
Understand Personally Identifiable Information (PII) Data Security considerations

Software areas of central focus:
BigInsights
BigSQL
Hadoop
Cloudant (NoSQL)

Software areas of peripheral focus:
Information Server
Integration with BigInsights, Balanced Optimization for Hadoop, JAQL Push down capability, etc
Data Governance
Security features of BigInsights
Information Server (MetaData Workbench for Lineage)
Optim Integration with BigInsights (archival)
DataClick for BigInsights (Future: DataClick for Cloudant - to pull operation data into Hadoop for Analytics - scripts available today)
BigMatch (trying to get to a single view)
Guardium (monitoring)
Analytic Tools (SPSS)
BigSheets
Support in Hadoop/BigInsights
Data Availability and Querying Support
Streams
Interface/Integration with BigInsights
Streaming Data Concepts
In memory analytics
Netezza
DB2 BLU
Graph Databases
Machine Learning (System ML)

No comments: