Data Scientist II
Honeywell
Job Description
Data Scientist (3–6 Years Experience)
Location
Bangalore, India (Hybrid / Remote as applicable)
Role Overview
We are looking for a Data Scientist with strong analytical and machine learning skills to work on data‑driven problem solving and model development.
The role focuses on hands‑on analysis, model building, and deployment support, working closely with senior data scientists, engineers, and product teams.
You will contribute to building scalable ML solutions and help convert business problems into data science use cases.
Honeywell helps organizations solve the world's most complex challenges in automation, the future of aviation and energy transition. As a trusted partner, we provide actionable solutions and innovation through our Aerospace Technologies, Building Automation, Energy and Sustainability Solutions, and Industrial Automation business segments – powered by our Honeywell Forge software – that help make the world smarter, safer and more sustainable.
As a Data Scientist II here at Honeywell, you will develop and implement advanced analytics models to solve complex business problems, collaborating with teams to deliver actionable insights and drive data-driven innovation.
Experience
3–6 years of relevant industry experience
Key Responsibilities
Data Analysis & Exploration
- Perform exploratory data analysis (EDA) on structured and semi‑structured data
- Clean, preprocess, and transform large datasets
- Create clear visualizations and insights for stakeholders
- Write efficient and readable SQL queries for analysis and reporting
NLP & GenAI (Exposure Preferred)
- Work on NLP tasks such as text classification, similarity, and entity extraction
- Use pre‑trained models from Hugging Face or cloud APIs
- Assist in building LLM‑based applications (prompt engineering, simple RAG pipelines)
- Evaluate outputs for quality, relevance, and bias
Data Engineering & Pipelines (Good to Have)
- Consume data from data warehouses and data lakes
- Build or modify batch data pipelines using Spark or Python
- Assist with workflow orchestration using Airflow / Prefect
- Understand basic streaming concepts (Kafka exposure is a plus)
Model Deployment & MLOps (Optional)
- Package models for deployment with guidance from senior team members
- Support model deployment using REST APIs (FastAPI or similar)
- Track experiments, metrics, and models using tools like MLflow
- Monitor basic model performance and data quality post‑deployment
Collaboration & Learning
- Work closely with product managers, analysts, and engineers
- Clearly communicate findings and recommendations
- Participate in code reviews and team discussions
- Continuously learn and apply new tools and techniques
Required Skills & Qualifications
Technical Skills
- Strong proficiency in Python (pandas, numpy, scikit‑learn)
- Good knowledge of SQL (joins, aggregations, subqueries)
- Solid understanding of:
- Statistics & probability
- Linear regression, classification models
- Experience with machine learning libraries
- scikit‑learn
- XGBoost / LightGBM (preferred)
Data & ML Tools
- Experience with Jupyter notebooks
- Familiarity with Spark / PySpark (hands‑on or project experience)
- Basic experience with MLflow or similar experiment tracking tools
- Version control using Git
Cloud & Platforms
- Working knowledge of at least one cloud platform:
- AWS / Azure / GCP
- Experience querying data from:
- Snowflake / BigQuery / Redshift (or similar)
- Basic understanding of data lakes and warehouses
Preferred / Nice‑to‑Have
- Exposure to PyTorch or TensorFlow
- Experience with NLP or GenAI projects
- Familiarity with Docker
- Understanding of basic data engineering concepts
- Experience working in agile teams
Machine Learning Algorithms & Techniques (Hands‑On)
Supervised Learning
- Linear Models
- Linear Regression
- Logistic Regression
- Regularization (L1, L2, Elastic Net)
- Tree‑Based Models
- Decision Trees
- Random Forest
- Gradient Boosting (XGBoost, LightGBM, CatBoost)
- Clustering Techniques
- K‑Means
- Hierarchical Clustering
- DBSCAN
- PCA (feature reduction)
- t‑SNE / UMAP (visualization & analysis)
Dimensionality Reduction
Time Series & Forecasting (Basic–Intermediate)
- Statistical forecasting:
- Moving averages
- ARIMA / SARIMA (conceptual + basic use)
- ML‑based forecasting using regression and tree‑based models
Model Evaluation & Optimization
- Cross‑validation techniques
- Hyperparameter tuning (Grid Search, Random Search)
- Bias–variance tradeoff
- Handling class imbalance
- Selection of appropriate evaluation metrics