Posts

Business Rules Service

The Business Rules Service…

  • consumes data from Apache Kafka topics
  • applies a set of customer defined rules
  • augments the record with the rules results
  • and puts it into another, a cleansed, Kafka topic.

This data can then be put into a database or a Data Lake to work on cleansed data and to analyze which rules were applied and their results.

In a simple end-to-end scenario the data is entered via various ways into the source system, e.g. a SAP S/4Hana system and the corresponding producer puts the changes into the Kafka topic “SALES” immediately.
The Rules Service is configure to take all change messages from the topic “SALES”, apply the rules on the data and put the augmented records into the topic “SALES_CLEANSED”. 
From there a database consumer puts all cleansed data together with rule results into a database for visualizing the results.

Installation and technical details: https://hub.docker.com/r/rtdi/rulesservice

S/4Hana Connector

The S/4 Connector is a simple way of getting S/4Hana table data 1:1 into Apache Kafka. Once the data is available in Kafka, any services or data consumers can use the data in the frequency they need. A Data Warehouse might consume the data just once a day, other consumers within milliseconds.

Installation: https://hub.docker.com/r/rtdi/s4hanaconnector

Internally this connector does use the same techniques as SAP SLT does, but in a slightly optimized manner, tailor made for S/4Hana. Therefore it does not suffer from the side effects of SLT and has less impact on the source system.

File Connector

With the RTDI File Connector the format of a CSV file is defined in a data driven UI and all files matching a pattern will be uploaded as parsed object and renamed to *.processed after.

Installation: https://hub.docker.com/r/rtdi/fileconnector

Steps:

  1. Connection: The Connection specifies the root directory where all files are located.
  2. Producer: One Producer is responsible for one file type, e.g. all files to be parsed with the schema “address”. Producer also defines the target topic and the scan frequency.
  3. Schema Definition: In the Connection -> Schema -> Manage File Schemas the settings are defined.

SAP Hana Database

As part of the Hana development team I do know the details of the database quite well.

SAP Hana is a relational database like any other, but to take full advantage of it, its internal mechanisms need to be understood. I have been part of the Hana development group for multiple years, hence do know quite a lot.

A first information about what is so special with Hana can be found in my popular blog.saphana.com post.

Apache Kafka

Apache Kafka is the realtime backbone for data in many companies already.

In the past I learned data can be processed in parallel or with transactional guarantees. Kafka showed me that there often is a pragmatic middle way. The idea is, does it really matter if patient A or patient B’s data is processed first? In many cases no, as long as all data belonging to a single patient is processed in the correct order.

Or in Kafka terms, the data is partitioned by the patient ID and thus all data within a partition is processed in order and transactional, parallel to other patients’ data.

That got me hooked on Apache Kafka. Its other concepts in regards to Load Balancing, KStreams, etc. are clever as well.

SAP with Big Data

There are many options to combine Big Data with SAP including cheap ones.

SAP follows the concept of openness, hence there are many options to integrate Big Data with SAP.

The downside of this freedom of choice is choosing the right approach. Which products to use, involved technologies, the business needs, how to involve users.

Examples:

  • An open source minded person might do all the transformations in e.g. Apache Spark and load the results into Hana via SQL DataSets.
  • A Hana team might connect to Hadoop using the Spark Connector.
  • A SAP person might suggest using SAP Data Hub for everything.

All approaches have pros and cons. Navigating through this minefield needs lot of background knowledge and experience.

SAP Data Integration

Pick and use the the best suited SAP product for data Integration.

SAP provides various products in the area of Data Integration. Some focus on pure Data Integration, others on Process or System Integration. The trick is to pick the one best suited for the given task.

(e.g. see here)

Of course it would be nice if one tool could solve all problems but the requirements are too diverse for that. In fact, one of the reasons there are so many options is because SAP tries to help customers and provide tools that perfectly match a typical use case. However, as soon as the use case does not match the products’ assumptions… things get hard.

On the other hand, with general purpose tools all things are made harder than they need to be. You need to find the one that fulfills your needs best.

If in doubt, we can brainstorm together. It will be much cheaper than what I have seen recently, products bought for >100k and not providing any value.

SAP Hana SDI

SAP Hana Smart Data Integration is one of the hidden gems, useful for many situations.

The vision of Hana SDI was to designu00a0 transformations once and let the user decide the qualities at activation time. He could choose:

  • Federation: Data is current as the source system gets queried directly but the users stress the source system with their queries unintentionally. The queries return with source-system speed, if that.
  • Batch ETL: Move the data at the highest possible speed into the Hana instance, hence queries will run at Hana speed. But data might is not current and not transactional consistent at any time.
  • Realtime Push: The source system is asked to push changes to Hana and these changes are incorporated into the Hana target tables. Hence all queries will run at Hana speed and will return current data. Depending on the transformation, this can be a lot of work to accomplish.

By using Hana Smart Data Access as the foundation and extending it, within a year (Nov 2014 with Hana 1.0 SPS9) the first version of Hana SDI was released. It allowed to add own Adapters and to use realtime push with various sources.

The other concepts of transitioning between the three integration styles was started but never completed. Only now this concept starts to get traction again by other Hana teams.

So if there are any questions in regards to Hana SDI, your chance to talk to the very creator of this product. I’m who you want to consult for this.

SAP Data Services

SAP Data Services is a very powerful ETL tool, per Gartner Magic Quadrant. We can help to be most efficient.

SAP Data Services is a very powerful ETL tool, as can be seen in the Gartner Magic Quadrant for Data Integration.

I was part of the product team in various roles, Developer, Product Manager, Performance Expert, Troubleshooter, Consultant, Trainer. Hence I can provide deep insights and build dataflows very quickly.

A suggested engagement is to get the customer team started, review the work on a frequent basis and build the most complex transformations together. This way the project team can learn while doing their jobs and will be most efficient.

During the days as product manager one of the tasks was to help the community. In the Business Objects Forum BOB answering questions. The majority of the in the SCN Wiki was created by me.