My skills
Python
I have over eight years of experience with Python, out of which over four years of industry experience. I have experience working with the following packages and frameworks:
- Web scraping: requests, requests-html, selenium
- Data extraction from PDFs: textract, pdfminer, camelot
- Data analysis: numpy, pandas
- Data visualization: matplotlib
- Connection between Python and SQL: pyodbc, SQLAlchemy
- Regression analysis: statsmodels, linearmodels
- Machine learning: tensorflow (CPU, GPU), scikit-learn
- Gen AI: transformers, llama-cpp-python, whisper, gradio
SQL Server
I have over four years of experience with SQL Server as my work as a Data Analyst at Rystad Energy primarily involved working with the database. I have experience working with the following:
- Writing queries of any complexity
- Importing data from .csv files and MS Access databases; some experience of working with SSIS packages to automate data import
- Developing stored procedures for data transformations, cleaning and deduplicating the raw data, and data modeling
- Optimizing existing stored procedures written in a non-optimal way (splitting large update queries into smaller ones, adding and removing indexes of tables where required)
- Using SQL Server Agent for scheduling both SQL stored procedures, Python scripts, and CMD /PowerShell scripts
- Connecting SQL Server to PowerBI dashboards
- Implementing change management of SQL Server stored procedures using:
- Azure Data Studio with SQL Database Project extension - for rebuilding database project after making changes
- BitBucket git repository - for version control
- Azure DevOps for managing CI/CD pipeline to deploy the changes to both development and production server
R
I have over four years of experience with R, all of which is academic. I have used R for the following (with package names used):
- Data description and data manipulation (dplyr)
- Simple linear and generalized linear models (base R)
- Time series analysis with ARMA and ARIMA models (forecast)
- Panel data regression models, with panel-corrected standard errors and presence of heteroskedasticity (panelAR); Github
- Two-staged Heckman model, correcting for self-selection bias (sampleSelection); Github
- Data visualization (ggplot2)
- Output the results of regression models to Latex (stargazer)
GenAI
I have got some experience with Machine Learning and generative AI in recent years, some of which is from my industry job and other is self-taught. I have some experience with the following:
- Classifying low-resolution satellite imagery of oil and gas well pads to identify the time periods corresponding to drilling and fracking (tensorflow)
- Using whisper for local speeh recognition; Github
- Using small, local AI models for translating text; Github
- Using local text-to-speech models; Github
- Running, smaller, local LLMs such as Llama or Gemma family using llama.cpp or koboldcpp