Senior Data Engineer Clearance: TS/SCI with ability to obtain Polygraph within reasonable period of time Location: Chantilly, VA
We are looking for team-members with creative talent who are ready to take on the challenge of Senior Data Engineer to help mitigate insider threat data from multiple sources in any format (structured or unstructured), transform it into interpretable fragments, and allow engines to categorize, quantify, distill, and display results for human analysts to interpret.
The Senior Data engineer will support insider threat capabilities in automating data integration and collection strategies and help expand and optimize the data ingestion pipeline architecture, develop strategies for efficient ingestion, processing, storage, structuring, and access. In addition, the Data Engineer will support data analysts, data scientists, and big data engineers in identifying data sources, performing exploratory data analysis, developing data models, ensuring data cleanliness and accuracy to provide new Insider Threat behavioral insights, please keep reading…
Roles and responsibilities potentially include:
Support data science team by designing, developing and implementing scalable ETL process for disparate datasets into a Hadoop infrastructure
Design, develop, implement, and maintain data ingestion process from various disparate datasets using StreamSets (experience with StreamSets not mandatory)
Develop processes to identify data drift and malformed records
Develop technical documentation and standard operating procedures
Leads technical tasks for small teams or projects
Required Experience and Qualifications:
A Bachelor’s degree in a STEM (Science, Technology, Engineering, Mathematics) related field, plus 8 yrs or a Masters degree plus 6 yrs.
Desired Experience and Qualifications:
Working knowledge of entity resolution systems
Experience with Hadoop and Hive/Impala
Experience with messages systems like Kafka
Experience with NoSQL and/or graph databases like MongoDB or ArangoDB
Any of the following databases: SQL, MongoDB, Oracle, Postgres
Working experience with ETL processing and Python
Working experience with data workflow products like StreamSets or NiFi
Working experience with Python RESTful API services, JDBC
Experience with Cloudera Data Science Workbench is a plus
Understanding of pySpark
Leadership experience
Creative thinker
Ability to multi-task
Excellent use and understanding of data engineering concepts, principles, and theories