SAP BODS Interview Questions & Answers

SAP BODS Interview Questions

 

What are SAP BODS?

Answer

A software tool designed by Business Object
Some of the basic purposes of this tool is to perform jobs like –

  • ETL (Extraction Transformation and Loading) – Pulling out data from any system/database/tables, applying changes to modify the data or applying programming logic to enhance the extracted data, and loading data into any other system/database or tables. E.g.: ETL of data from SQL server database to Oracle.
  • Data Warehousing – A database specifically designed and developed in a particular format to enable easy data analysis or reporting. This could be developed using data from various databases or any other data sources.
  • Data Migration – Moving of data from one place to another. This is a subset of ETL where data is relocated from one software system or database to another. This also involves modification and alteration of data.
  • Business Intelligence – A concept which combines the data warehousing system and reporting. This is applied to analyze data of an organization to effectively perform functions like Business performance improvement.

What is the difference between OLT and a Data warehouse?

Answer

  • Indexes − OLTP system has only a few indexes while in an OLAP system there are many indexes for performance optimization.
  • Joins − In an OLTP system, a large number of joins and data is normalized however in an OLAP system there are less joins and de-normalized.
  • Aggregation − In an OLTP system data is not aggregated while in an OLAP database more aggregations are used.

Why do we need a staging area in an ETL process?

Answer

There is a staging area that is required during the ETL load. There are various reasons why a staging area is required −

  • As source systems are only available for a specific period of time to extract data and this time is less than total data load time so Staging area allows you to extract the data from the source system and keep it in the staging area before time slot is ended.
  • The staging area is required when you want to get data from multiple data sources together. If you want to join two or more systems together. Example- You will not be able to perform a SQL query joining two tables from two physically different databases.
  • Data extractions time slot for different systems vary as per the time zone and operational hours.
  • Data extracted from source systems can be used in multiple data warehouse system, Operation Datastores, etc.
  • During ETL you can perform complex transformations that allow you to perform complex transformations and require extra area to store the data.

What is SAP Data Services?

Answer

SAP BO Data Services is an ETL tool used for Data integration, data quality, data profiling and data processing and allows you to integrate, transform trusted data to data warehouse system for analytical reporting.

BO Data Services consists of a UI development interface, metadata repository, data connectivity to source and target system and management console for scheduling of jobs.

Explain the architecture of BODS with Job Server, Repository Manager, and Management Console?

Answer

You can also divide BODS architecture in below layers −

Web Application Layer, Database Server Layer, Data Services Service Layer.

What is a repository in BODS? What are the different types of Repositories in BODS?

Answer

The repository is used to store meta-data of objects used in BO Data Services. Each Repository should be registered in Central Management Console CMC and is linked with single or many job servers which is responsible to execute jobs that are created by you.

There are three types of Repositories −

  1. Local Repository −It is used to store the metadata of all objects created in Data Services Designer like the project, jobs, data flow, workflow, etc.
  2. Central Repository −It is used to control the version management of the objects and is used for multiuse development. Central Repository stores all the versions of an application object so it allows you to move to previous versions.
  3. Profiler Repository −This is used to manage all the metadata related to profiler tasks performed in SAP BODS designer. CMS Repository stores metadata of all the tasks performed in CMC on the BI platform. Information Steward Repository stores all the metadata of profiling tasks and objects created in information steward.

What are the single object and reusable objects in Data services?

Answer

Reusable Objects −

  • Most of the objects that are stored in a repository can be reused. When a reusable object is defined and save in the local repository, you can reuse the object by creating calls to the definition. Each reusable object has only one definition and all the calls to that object refer to that definition. Now if the definition of an object is changed at one place you are changing object definition at all the places where that object appears.
  • An object library is used to contain object definition and when an object is drag and drop from the library, it means a new reference to an existing object is created.

Single-Use Objects −

  • All the objects that are defined specifically to a job or  data flow, they are called single-use objects. Example- specific transformation used in any data load.

What is a Data Store in Data services designer and what are different types of Data Stores?

Answer

Datastore is used to set up the connection between an application and a database. You can directly create Datastore or can be created with the help of adapters. Datastore allows an application/software to read or write metadata from an application or database and to write to that database or application.

You want to set up a new repository in BODS. How do you create it?

Answer

To create BODS Repository you need a database installed. You can use SQL Server, Oracle database, My SQL, SAP HANA, Sybase, etc. You have to create below users in the database while installing BODS and to create Repositories. These users are required to login to different servers CMS Server, Audit Server. To create a new repository, you have to log in to the Repository Manager.

What is a real-time Job?

Answer

Real-time jobs “extract” data from the body of the real-time message received and from any secondary sources used in the job.

How do you manage object versions in BODS?

Answer

The central repository is used to control the version management of the objects and is used for multiuse development. Central Repository stores all the versions of an application object so it allows you to move to previous versions.

You want to generate the quality reports in the DS system, data validation, and documentation. Where you can see this?

Answer

Data Services Management Console

What is the template table?

Answer

In Data Services, you can create a template table to move to the target system that has the same structure and data type as a source table.

How do you check the execution history of a job or a data flow?

Answer

DS Management Console → Job Execution History

What is SAP Data Services Designer? What are the main ETL functions that can be performed in a Designer tool?

Answer

It is a developer tool which is used to create objects consist of data mapping, transformation, and logic. It is GUI based and work as a designer for Data Services.

You can create various objects using Data Services Designer like Project, Jobs, Work Flow, Data Flow, mapping, transformations, etc.

How do you check existing objects in DS repository?

Answer

In Object library in DS Designer

How do you improve the performance of data flows using memory datastore?

Answer

  • You can create Datastore using memory as a database type. Memory Datastore is used to improve the performance of data flows in real-time jobs as it stores the data in memory to facilitate quick access and doesn’t require to go to the original data source.
  • A memory Datastore is used to store memory table schemas in the repository. These memory tables get data from tables in Relational database or using hierarchical data files like XML message and IDocs.
  • The memory tables remain alive till job executes and data in memory tables can’t be shared between different real-time jobs.

What is linked datastore? Explain with an example?

Answer

There are various database vendors which only provides one-way communication path from one database to another database. These paths are known as database links. In SQL Server, the Linked server allows one-way communication path from one database to other.

Example −

Consider a local database Server name “Product” stores database link to access information on remote database server called Customer. Now users that are connected to remote database server Customer can’t use the same link to access data in database server Product. The user that is connected to “Customer” should have a separate link in the data dictionary of the server to access the data in the Product database server.

This communication path between two databases are called database link and Datastores which are created between these linked database relationships are known as linked Datastores.

There is a possibility to connect Datastore to another Datastore and importing an external database link as an option of Datastore.

You want to import application metadata into a repository. How you can perform this?

Answer

Adapter Datastore allows you to import application metadata into a repository. You can also access application metadata and you can also move batch and real-time data between different applications and software.

What are the different types of files can be used as a source and target file format?

Answer

  • Delimited
  • SAP Transport
  • Unstructured Text
  • Unstructured Binary
  • Fixed Width

You want to extract data from an Excel workbook. How you can do this?

Answer

You can use Microsoft Excel workbook as a data source using file formats in Data Services. Excel workbook should be available on the Windows file system or Unix File system.

What is the use of data flow in DS?

Answer

Data flow is used to extract, transform and load data from source to the target system. All the transformations, loading and formatting occurs in the dataflow.

What are different objects that you can add to a dataflow?

Answer

  • Source
  • Target
  • Transforms

What are the different properties that you can set for a data flow?

Answer

  • Execute once
  • Parallelism
  • Database links
  • Cache

Why do you use workflow in DS?

 Answer

Workflows are used to determine the process for executing the workflows. The main purpose of workflow is to prepare for executing the data flows and to set the state of system once data flow execution is completed.

What are the different objects that you can add to the workflow?

Answer

  • Workflow
  • Data flow
  • Scripts
  • Loops
  • Conditions
  • Try or Catch Blocks

Is it possible that a workflow calls itself in Data services job?

Answer

Yes

Give an example of a workflow in production?

 Answer

  • There is a fact table that you want to update and you have created a data flow with the transformation. Now If you want to move the data from the source system, you have to check last modification for fact table so that you extract only rows that have been added after the last update.
  • In order to achieve this, you have to create one script which determines last update date and then pass this as an input parameter to a data flow.
  • You also have to check if the data connection to a particular fact table is active or not. If it is not active, you need to set up a catch block which automatically sends an email to the administrator to notify about this problem.

What is the use of conditionals?

Answer

You can also add Conditionals to the workflow. This allows you to implement If/Else/Then logic on the workflows.

What is the transformation in Data Services?

Answer

Transforms are used to manipulate data sets as inputs and creating one or multiple outputs. There are various transforms that can be used in Data Services.

What are the common transformations that are available in Data Services?

Answer

  • Data Integration
  • Data Quality
  • Platform
  • Merge
  • Query
  • Text data processing

What are different transformations under data integration?

Answer

  • Data_Generator
  • Data_Transfer
  • Effective_Date
  • Hierarchy_flattening
  • Table_Comparision, etc.

What is the use of query transformation?

Answer

This is the most common transformation used in Data Services and you can perform below functions −

  • Data filtering from sources
  • Joining data from multiple sources
  • Perform functions and transformations on data
  • Column mapping from input to output schemas
  • Assigning Primary keys
  • Add new columns, schemas and functions resulted to output schemas
  • As Query transformation is the most commonly used transformation, so a shortcut is provided for this query in the tool palette.

What is a text data processing transformation?

Answer

  • This allows you to extract the specific information from a large volume of text. You can search for facts and entities like the customer, product, and financial facts specific to an organization.
  • This transform also checks the relationship between entities and allows extraction.
  • The data extracted using text data processing can be used in Business Intelligence, Reporting, query, and analytics.

What is the difference between text data processing and data cleansing?

Answer

Text data processing is used for finding relevant information from unstructured text data however data cleansing is used for standardization and cleansing structured data.

What is a real-time job in Data Services?

Answer

You can create real-time jobs to process real-time messages in Data Services Designer. Like a batch job, real-time job extracts the data, transform and load it.

Each real-time job can extract data from a single message or you can also extract data from other sources like tables or files.

Explain the difference between a real-time and batch job in Data Services?

Answer

  • Transform like branches and control logic are used more often in real-time job unlike the batch jobs in a designer.
  • Real-time jobs are not executed in the response of a schedule or internal trigger, unlike the batch jobs.

What is an embedded data flow?

Answer

Embedded data flow is known as data flows which are called from another data flow in the design. The embedded data flow can contain the multiple numbers of source and targets but only one input or output pass data to the main data flow.

What are the different types of embedded data flow?

Answer

  1. One Input − Embedded data flow is added at the end of dataflow.
  2. One Output − Embedded data flow is added at the beginning of a data flow.
  3. No input or output − Replicate an existing data flow.

What are the local and global variables in Data services job?

Answer

Local variables in data services are restricted to the object in which they are created.

Global variables are restricted to jobs in which they are created. Using global variables, you can change values for default global variables at run time.

How variables are different form parameters in a Data Services job?

Answer

Expressions that are used in workflow and data flow they are called parameters.

All the variables and parameters in workflow and data flows are shown in variable and parameters window.

What is the different recovery mechanism that can be used in failed jobs?

Answer

  • Automatic Recovery – This allows you to run unsuccessful jobs in recovery mode.
  • Manually Recovery – This allows you to rerun the jobs without considering partial rerun previous time.

What is the use of Data Profiling?

Answer

Data Services Designer provides a feature of Data Profiling to ensure and improve the quality and structure of source data. Data Profiler allows you to −

Find anomalies in source data, validation and corrective action and quality of source data.

Answer

  • The structure and relationship of source data for better execution of jobs, workflows and data flows.
  • The content of the source and target system to determine that your job returns the result as expected.

Explain the different performance optimization techniques in BODS?

Answer

The performance of an ETL job depends on the system on which you are using Data Services software, number of moves, etc. There are various other factors that contribute to the performance in an ETL task −

  • Source DataBase
  • Source Operating System
  • Target Database
  • Target Operating System
  • Network
  • Job Server OS
  • BODs Repository Database

What do you understand by multiuser development in BODS? How do you manage multiuser development?

Answer

SAP BO Data Services support multi-user development where each user can work on the application in their own local repository. Each team uses the central repository to save the main copy of an application and all the versions of objects in the application.

You want to perform multiuser migration in SAP BODS. How you can perform this?          

Answer

  • In SAP Data Services, job migration can be applied at different levels- Application Level, Repository Level, Upgrade level.
  • To copy the content of one central repository to another central repository, you can’t do it directly and you need to make use of the local repository.
  • First is to get the latest version of all objects from a central repository to the local repository. Activate the central repository in which you want to copy the contents.
  • Add all the objects you want to copy from local repository to central repository.

Suppose you have updated the version of Data Services software? Is it required to update the repository version?

Answer

If you update version of SAP Data Services, there is a need to update version of Repository. Below points should be considered when migrating a central repository to upgrade version −

Point 1 – Take the backup of central repository all tables and objects.

Point 2 – To maintain the version of objects in data services, maintain a central repository for each version. Create a new central history with a new version of Data Services software and copy all objects to this repository.

Point 3 – It is always recommended if you install a new version of Data Services, you should upgrade your central repository to a new version of objects.

Point 4- Also, upgrade your local repository to the same version as a different version of the central and local repository may not work at the same time.

Point 5-Before migrating the central repository, check-in all the objects. As you don’t upgrade central and local repository simultaneously, so there is a need to check in all the objects. As once you have your central repository upgraded to a new version, you will not be able to check-in objects from a local repository which is having an older version of Data Services.

What is slowly changing dimension?

Answer

SCDs are dimensions that have data that changes over time.

How do you manage slowly changing dimensions? What are the fields required in managing different types of SCD?

Answer

SCD Type 1

  • No history preservation
  • A natural consequence of normalization

SCD Type 2

  • Preserving all history and new rows
  • There are new rows generated for significant changes
  • You need to use a unique key
  • There are new fields are generated to store history data
  • You need to manage an Effective_Date field.

SCD Type 3

  • Limited history preservation
  • In this only two states of data are preserved – current and old

Is file format in Data Services type of a data store?

Answer

No, File format is not a datastore type.