Alexa Translations is hiring a
Data Engineering Lead in Worldwide

Logo of Alexa Translations
Data Engineering Lead
🏢 Alexa Translations
💵 $100k-$150k
📍Worldwide
📅 Posted on Jun 29, 2024

Summary

The job is for a Data Team Lead at Alexa Translations, responsible for managing a team, developing data collection strategies, and ensuring high-quality bilingual data sets for their AI engine. The ideal candidate has a background in data engineering, fluency in English (French is advantageous), and experience in data crawling tools, ETL processes, and big data platforms.

Requirements

  • Bachelor's degree (Master's Degree is an advantage) in Computer Science, Data Science, or a related field
  • At least 3 years of working experience working in a data engineering department, preferably as a Senior Data Engineer in a fast-paced environment and complex business setting
  • Experience in data collection, cleaning, and management, preferably in a linguistic or translation-related field
  • Fluency in English (French is an advantage), with excellent written and verbal communication skills in both languages
  • Strong analytical and problem-solving skills, with the ability to work with large and complex data sets
  • Extensive hands-on experience with data crawling tools (Scrapy, etc)  and techniques, should be able to develop customized crawlers
  • Demonstrated experience in building and maintaining reliable and scalable ETL on big data platforms as well as experience working with varied forms of data infrastructure inclusive of relational databases such as SQL and Spark
  • Experience in data warehousing inclusive of dimensional modeling concepts and demonstrating proficiency in scripting languages, for example, Python, Perl, and so forth
  • Deep knowledge of data mining techniques, and relational, and non-relational databases
  • Sound understanding of building complex ETL pipelines using either open-source tools such as mage, Luigi and airflow or cloud-based solutions such as AWS Glue
  • Highly proficient in the use of MS Office products (Word, Excel, PowerPoint)
  • Ability to perform complex data analyses with large data volumes
  • Strong knowledge in Linux, OS tools, and file-system level troubleshooting
  • Substantial experience working with big data infrastructure tools such as Python, SQS, and Redshift
  • A suitable candidate will also be proficient in Scala, Spark, Spark Streaming, AWS

Responsibilities

  • Leading and managing a team of data specialists responsible for crawling domain-specific bilingual data for training the AI engine
  • Developing and implementing data collection strategies to ensure the acquisition of high-quality bilingual data sets
  • Overseeing the cleaning and preprocessing of crawled data to remove noise and ensure accuracy
  • Collaborating with other teams, such as engineering and linguistics, to understand data requirements and optimize data collection processes
  • Monitoring data quality and performance metrics, identifying areas for improvement and implementing solutions
  • Staying up-to-date with industry trends and best practices in data collection, cleaning, and management
  • Designing, deploying, and maintaining the business’s data platforms
  • Owning and extending the business’s data pipeline through the collection, storage, processing, and transformation of large data sets
  • Participating in the design, and providing insights and guidance on database technology and data modeling best practices
  • Developing and managing scalable data processing platforms used for exploratory data analysis and real-time analytics
  • Building a metadata system where all available data is maintained and cataloged
  • Developing ETL processes that convert data into formats through a team of data analysts and dashboard charts
  • Retrieving and analyzing data through the use of SQL, Excel, among other data management systems
  • Building data loading services for the purpose of importing data from numerous disparate data sources, inclusive of APIs, logs, relational, and non-relational databases
  • Developing reliable data pipelines that translate raw data into powerful useful data points

Preferred Qualifications

  • Working knowledge with CAT Tools such as: memoQ, SDL, Memsource
  • Desire to continue to grow professional capabilities with ongoing training and educational opportunities
  • AWS Cloud Applications and Services
Help us out by mentioning to Alexa Translations that you discovered this job opportunity on JobsCollider. Your support is greatly appreciated. Thank you 🙏
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Jobs