ABOUT NORCOM
Setup and operation of a high-availability big data analytics platform
The task
To provide a central analytics platform for a wide variety of use cases from different departments, a big data environment must be designed, implemented and put into operation. For the supply of data, this environment should be connected to various source systems on-premise and in the cloud via data loading paths.
The challenge
For productive use, all relevant cluster services must be multi-client capable and highly available, the data must not be kept in the cloud for reasons of data security. Personal data require the flexible establishment of rules for storage, use and deletion.
our solution
A big data environment based on Hadoop was designed, implemented and put into operation. To go live, all relevant cluster services were kerberized and configured with high availability, an additional test cluster was established for the transfer of frameworks and processes tested on the development environment into regular operation and a mirror environment was defined to ensure business continuity. For advanced analytics on large data, GPU resources were integrated into the Hadoop cluster. The specialist departments were familiarized with the use of the new platform in several innovation workshops.
The customer benefit
Thanks to the use of big data, a wide variety of data sources can be analyzed comprehensively for the first time. Automated mechanisms ensure the quality of the data. Thanks to high availability, evaluations are available around the clock. A transparent documentation of the architecture and the processes enables the customer to solve even complex problems in self-service.
Project-
Characteristics
Our role
Consulting / Dev Ops / System Administration
Our activities
Planning, installation, operation of HDP (Hortonworks) cluster environments
Rely on high availability of all relevant systems (Hadoop, Postgres)
Automatic mirroring of important data between Hadoop clusters
Identity Management with integration on Hadoop (Kerberos)
Advice on technology stack
Technologies & methods
Applications: Hadoop, Hive LLAP, NiFi, PowerBI, DaSense, Oozie, Ranger, Ambari, Yarn, IPA, HAProxy, Keepalived, Postgres, PGBouncer
Databases: Hive, Postgres
Languages / frameworks: Python, Shell, SQL / Docker, CUDA, Map / Reduce, Tez, Spark, Kerberos, Jira, Git, UML, Jenkins
Methods: Agile, ITIL, DevOps