The engineering team experience at Panaseer
Hiya, I’m Lucy and I work as a Full Stack Engineer at Panaseer. If you’re reading this post, you may already know a bit about the development experience at Panaseer — or perhaps you heard about our series B funding...Either way, the aim of this post is to give some insight into the main languages, tools, and products we use on the Engineering team at Panaseer. For each of the core areas of our development process, I will give an overview of what each tech component does, the reasons we chose it, and if there are any changes planned for the future.Join our team! We are #hiring for a number of roles across the Engineering team! If you’re interested in joining Panaseer on our growth journey and work in an environment that fosters innovation and autonomy, get in touch.Some of the role we currently have open through Workable:Data Engineer
Senior Data Engineer
Senior QA Engineer
Core Languages
Most of the Engineering work at Panaseer is done using one of our core languages. Across the Web-, Data- and Data Science platform, these are the languages and frameworks our engineers spend most of their time working with.Whilst most of them have been a core part of our work for many years, we continuously evaluate new languages for opportunities to modernise our stack and take advantage of emerging technologies.Java
We use Java 8 for most of our consumer-facing APIs. Some of our key reasons for choosing Java were:Java is the most popular language for backend applications and therefore it is easy to find developers with Java skills- Long-term support is available
- Large community of libraries and features
- Monitoring tool availability
- We are currently in the early stages of upgrading to Java 11. Looking to the future, our only planned changes in this area will be to keep upgrading versions as they become available.
Scala
Our data platform team uses Scala primarily for all programming needs when working with Hadoop, particularly Spark. Scala’s functional programming style lends itself well to concurrent programming and therefore can be very concise, however compared to Java it can be more challenging to find developers & the learning curve is quite steep.TypeScript
We use Typescript in combination with Angular 11 for the majority of our Frontend work. As our Frontend codebase grew over time, the need for more structure and stricter typing made the introduction of Typescript a no-brainer.Python
As Python is a powerful scripting language, we use it for a lot of cloud automation tasks, as the scripts tend to be relatively short & aren’t often re-used.Python is popular with our Data Science team, due to its ability to quickly iterate scripts and the ecosystem of data science libraries that Python supports.SQL
Our platform is focused on data and the insights we can draw from it — SQL is the front runner for achieving this. In the past, we have explored other technologies such as Graph & NoSql databases, but SQL has given us the best portability & compatibility we need when moving between technologies.Development Platform
The development platform across the Web and Data stack provides structure and form to our applications.They’ve been chosen for the performance they provide on high data volumes and application structure that enables us to scale in a stable & effective manner.Web Platform
Angular
Angular has been our Frontend framework of choice since the beginning of Panaseer. As a large-scale enterprise application, we found ourselves in need of an opinionated framework that can support our scale early on.Initially working with AngularJS, over time we’ve transitioned the majority of our codebase to version 11 (at the time of writing the post) — what a change that’s been.Whilst other frameworks have emerged and risen to popularity, we find Angular continuing to satisfy our development needs at scale.Most recently, we’ve continued to drive innovation on the Frontend codebase by introducing the Nx Monorepo framework. This will be a crucial step to support our next growth stage, both in terms of application size and Engineering team growth.Spring
Spring is a framework, which allows you to auto-configure standalone Java applications. At Panaseer we use Spring Boot to build all our user facing APIs.Spring Boot is a popular choice for developers using Java, as it is possible to get a production level application running quickly, without having to worry too much about configuring the application correctly and safely.Some of the main benefits of Spring Boot are:- It comes with embedded HTTP servers, such as Tomcat.
- There is no need for writing a lot of boilerplate code, annotations or XML configuration, as these are all included in the framework already.
- Spring allows you to easily connect with database services such as PostgreSQL and MySQL.
- Spring is an easy tool for junior developers to learn.
Data Platform
Apache NiFi
NiFi is a tool used to ingest data from multiple different sources. It can connect to a variety of sources, such as REST endpoints, log files & SQL. It has a drag and drop UI, making it easy to use and configure by both technical and non-technical teams.Apache Kafka
Kafka is an industry standard data streaming tool. We use it as an online state store, for data that has been ingested from NiFi. It is useful for us as the ingested data gets persisted before being processed in our platform, meaning we can re-read the data in the future as well as in real-time. Like NiFi, Kafka is flexible & integrates easily with many tools.Apache HBase
HBase is a NoSQL database, which runs on top of HDFS (a file system provided by Hadoop). It is the data layer used by our presentation API, to display results and metrics on the UI. As a NoSQL DB, it gives us the ability to do very performant random lookups and almost infinite scaling.HBase will be replaced when we move to a fully Cloud Native solution. We are in the process of investigating the possibility of using AWS Redshift and Snowflake when that time comes.Apache Phoenix
Phoenix is a SQL-like query engine that we use to query HBase data. It translates SQL queries into range scans and aggregations in HBase.Whilst it looks like SQL, it only has a subset of features, meaning that the portability of the SQL that we write, is limited slightly. When we move to Cloud Native, this will also be replaced by the SQL of technologies such as AWS Redshift or Snowflake.Apache Hive
Hive is our batch data processing layer for writing and reading data from HDFS. It also provides a SQL-like query engine and is an industry-standard tool when working with Hadoop. It is a columnar data store so is well suited for analytical queries and “big data” workflows. With the move to Cloud Native, HDFS will be replaced by Cloud storage services such as S3.Apache Spark
Spark is a highly scalable data transformation tool. It is the industry standard choice for data transformation, as it is performant and works well with other tools. The transformation interface is similar to SQL, so familiarity with SQL means there are transferable skills when using Spark. Spark provides a level of abstraction over MapReduce and allows transformations to be easily chained together.We are currently exploring the option of using AWS EMR which provides a more cloud native approach to Spark.Continuous Integration
Cypress
Cypress is an E2E JavaScript testing framework, that allows us to write functional tests for our web application. As a relatively new tool, native Cypress offers some unique functionality that other more common test frameworks such as Selenium do not provide. This includes the ability to debug using dev tools inside the test run, as well as the automatic waiting for commands & assertions, which eliminates test flakiness and removes the need for boiler plate code to manage this.When choosing Cypress, some of our core requirements were:- The ability to mock/stub requests so we could use the tool for integration testing between our Web and API layer.
- Ability to assert our visual elements are as we expect.
- Ability to capture the screen at the point of failure to make investigation of failing tests easier. Runnable within CircleCI.
- Cypress Studio — a tool that generates the test code by recording user interactions with the browser whilst running through the test scenario.
- Iframe and webkit support.
Circle CI
Circle Ci has been powering our Continuous Integration and Continuous Delivery work for the last few years. The decision to go with CircleCi was based on the out-of-the-box functionality it provides including pipeline metrics that help us identify and address build, test and release bottlenecks on a continuous basis.Gradle
Gradle is a build automation tool, used for JVM languages; we use it on both our Spring and Scala APIs. The build scripts are written in Groovy but the concepts are taken from Apache Maven & Apache Ant. However, a few of the reasons for preferring Gradle over these are:- Less bloated scripts as they are Groovy, not XML.
- Easier to read.
- Easier to add custom build code.