Brian Ray: Well, I mean, you must have some historical data in most organizations, from the folks who are working for you today. So I would start there, look at what data you have available. If it be, you know, resumes or job descriptions, or whatnot, start with that, and then move quickly on to where you want your capabilities to move to. So do you have any business objectives as a business that you want to get into a certain area?
Do you have any business problems need solved? What sort of capabilities and skills will support that movement?
Once you have a general idea of that data, it's always a good idea to do an exercise an analytical exercise to look at what key phrases occur most frequently throughout that data, and they segment based off of title and other aspects of it. And then you move on from there. And eventually, what you'll end up with through that process is a taxonomy that clearly displays the different areas your business operates in today, as well as some of the areas where you want to move to.
Having clean data, in the very beginning, is essential in order to even start this exercise.
Brian Ray: Yeah, I mean, you're dealing sometimes with data that isn't as precise as other data, it's not like physics data, where you're just dealing with numbers. Sometimes when you're dealing with human data, you're dealing with titles and descriptions that need disambiguated. And that's something to keep in mind that there is no perfect clean data. But when you do an exercise, and you have enough quantity of data, the noise goes away, and the signal starts coming up. And you start seeing clearly what traits identify certain positions and roles and things like that.
And so when you're diving into the intricacies of data requirements, what specific datasets would be important for this AI driven skills taxonomy to effectively analyze, you know, what are the diverse range of data sources we would use?
Brian Ray: Yeah, good question. I mean, you can rely on your historical candidate data, which would be CVs and resumes. And also correspondence around the first lines of communication with folks who have not hired yet, you can also look at ONET occupation codes, those are available as their open source or other occupation codes that set out there to help you segment different lines of business and different capabilities.
And it's also interesting to look at data as a timeline. So for instance, it's very difficult to do, but if I'm able to notice that someone has done a particular job for a while much longer than they have done these other jobs, it's a good idea to weigh that to take into consideration that this person may have held this position for 10 years, versus another candidate that may have held that same position for 10 months. So you always want to contextualize the data, when you're looking at people data, you want to be able to understand that in the context of the knowledge that they may have gained throughout that process.
I'm just thinking about the skills or where we would house that type of data, and it could be how long has someone sat in a specific role. Or would you be able to capture the top three skills that someone particularly has, you know, where would that data sit normally for an organization? Or how would we be able to capture it if an organization didn't have it?
Brian Ray (05:50): Yeah. And I think it's grouped largely by capabilities, I believe. And if I was going to stand back and look at the whole pyramid of what a business is trying to accomplish, that's really important, in human capital to understand what is the business trying to do? I like to look at it through the lens of business use cases to start and a business use case could be something, you know, like, we want to be able to quickly adjust claims in the insurance industry.
And that's a business use case - we want to increase the ability to do that. And when I look at that pyramid, I can say, Well, okay, that's a use case that has an ROI around it or return on investment and has a risk around it. And I can measure those things, I compare those to other use cases.
Now where capabilities come into play, whether they're using solutions on top of that functionality or not your capabilities as a business is your ability to address that use case and to be able to solve it essentially.
So I have a capability of looking at documents, automated or manually. That's a capability, and then where do these capabilities derive from? They derive from skill. You're capable of doing something because you have the skills to do it. So that whole pyramid of going use case, you know, addressing that to a business problem, and then solutions that you may use to address that problem. And then you know, capabilities and then skills within there, that's an important pyramid of things that most businesses are, are trying to get our arms around the whole stack.
They're trying to get it around the ability to do that, generally speaking, and that's - they sometimes look to partners to do it. Or they sometimes look to build more skills. So that means hiring or building skills with existing staff, how do you learn?
How do you learn how to do that in your business, and then deliver it?
There're also some overarching skills sometimes that'll drive, you know I always say project management and things like that those are these overarching skills that kind of help encapsulate some of those other capabilities help you move forward across capabilities. So those are important too from an operational standpoint.
Felicia Shakiba: And I would imagine that leaders from various divisions would need to converge to shape this taxonomy. Those are the SME's the subject matter experts.
Could you share, what kind of input would an organization need from them? And how might this collective dynamic enhance the overall accuracy of what we're trying to measure? Or even predict?
Brian Ray: Yeah, I think that, you know, I call it human in the loop to some degree. So if I am an organization, and the capabilities either sit within a certain department - let's say it's a large organization, then those are the folks that are helping solve the problem, that may be different from the department that's receiving that solution. The people receiving that solution, the groups may be a completely different department with a completely different skill set. And the question that always comes into play is where do those two things converge?
How do you make sure that the folks who are able to solve the problem have the right understanding of the problem they're solving, and then also, the people who are receiving the benefit of that solution have the right understanding whether that solution is actually going to solve their problem or not, you know, from a technical perspective.
So when you look at a taxonomy, those words may be different. You know, those words may be different, when you're looking at it through the lens of being the recipient, internal customer was a term that used to use a lot. If I'm the internal customer of that capability, I may use different words to describe what I'm getting versus if I'm a producer of that solution.
And if there are different words, but in theory, they're talking about the same thing, how do you marry those.
And what we start doing is we start talking about grouping of these abstract concepts and call them topics. So if you have a topic model, you're talking about grouping of topics that you get away from the specific words, but you look at a pool of words that may describe a solution or capability, or skill.
And you look at how that pool of words used and contacts with each other - they're talking about the same thing. And you know, the most simplest example I could give that thinking would be if I were to take all the news articles, in the last 10 years, the typical example in natural language processing, I take all of the news articles, and I use unsupervised AI or machine learning to group them, I would notice that news articles about sports, very naturally grouped with other news articles about sports.
And same with news articles about the government, or politics, and so on and so forth. And the natural way for those things to group together is exactly how you want to look at topics when you're talking about grouping around capabilities and skills, is that you want to group them by frequency.
And you want to understand there's an area here that's very interesting, in fact we did that for ourselves that Eviden.
We said, Okay, we want to double check to make sure that the candidates that are coming in, are being interviewed by the right department. So we started, not deterministically. But just as advice to those looking at it, the humans, giving them some heads up of where that candidate based off correspondence or the resume and everything else, where would that candidate fall? Would they fall into the HR department themselves?
Or would they be in the legal department, or they'd be in the data science department or engineering? And we've came up with some surprising results occasionally, it's like we were looking at this candidate through this lens, they applied for this area, but they really fit in better in this other area.
Maybe we should ask them why are they looking for a job change? Are they looking for a career change or are they mis-categorized?
If you see for instance, that they're interested in AI all over their resume, but they're applying for a job as a business intelligence officer or something like that, let's just say doesn't line up one to one, it's a good idea to go fact check that and go figure out if that person is really looking for a job in AI, or if they're looking for that business intelligence job.
And that's a good example of topic modeling, because you're understanding the general topic of the information they provided, and you're modeling around it. But it's fairly abstract, because you're only looking at these combinations of skills and words and things like that. So you're trying to understand based off of that, what a machine would recommend that person - what would be expected of it.
And you're marrying that suggestion from just looking at the resume or just grabbing the data from the resume?
Brian Ray: Right, right, you're taking the terms used in the resume and how they're used, sometimes you contextualize it. You're looking at those key words, and you're seeing how often they fall in the bucket you expect them to fall into, it's just a statistical thing. It could also be by the way; it doesn't have to be just the resume.
It could also be the letter, introductory letter, and also other correspondence they've had with the company so far, you know, the words that they've used. And, you know, if you get really wise about it, and you have their permission, if you're recording like even this recording, right now, you could take all the words we've spoken so far, and probably predict that we're talking about CPO PLAYBOOK, and you can predict those things just because of the frequencies of the words that we use.
Now the risk is, is that sometimes that words used in resumes can be used to unfairly categorize someone, and you don't always want that you want to sometimes be cautious of classifying into job roles and things like that. And there's several ways to handle that sort of thing.
One thing we'd, like is occupation codes I mentioned before, you know, if you could have that as the middle taxonomy that you, you create your own way of mapping to that you can handle that you're not misusing some information, blindly marrying them with a job requirement, versus being able to handle it as a skill versus a words of that could be found in description of a school that you went to, or something like that, that has nothing to do with their skills.
Interesting. Yeah, because the resume might put you in one direction. And that's really not authentically, how they perform...
Brian Ray: Yeah, you're just blindly looking at words and you could have words in the title of some jobs you've held or something like that - have nothing to do with what you want to do or where you want to go.
Right. So we are not currently able to decipher whether someone is a master at a skill or low level at the skill, we're really just looking at the words that are being used. And then essentially, later on, it is still important to leverage the data not to make decisions, but rather to help guide us to make decisions with additional data.
Brian Ray: Yeah, generally, there are some exceptions to that. My friend John, I went hiking with yesterday, he's an orthopedic surgeon. And if you look at his CV, there are probably skills that do point towards expertise. There, you know, years, there are certain things in that aspect that do signify that this person is an MD, they have that initial in their name, and so on and so forth.
So levels implied with the validity of what someone might be a master of.
Brian Ray (15:39): Yeah, yeah. And it could be assumed, like same with you know, any other credentialing title.
And that could happen more explicitly. If the resume contains specific certifications, like for instance, you know, I take some cloud certifications, sometimes get those, you can then assume that I have some level of expertise and there's someone more direct one to one mapping with a skill set versus if I'm talking about my paragraph, I write in my resume about what I did on a job.
That's not as deterministic as if I have this explicit. I got this certification in this particular technology or something like that.
Got it. And then you touched on the fairness of the data. And so I'm curious to learn, what measures are you putting into place to mitigate the biases or the inaccuracies?
Brian Ray: Fantastic question, and we take it very seriously to I mean, the one way that we've approached this is what you can do with a job requirement, and you can do with a resume and those are two different ends of the spectrum, right?
That requirement tells you what you're hiring for, and the resume tells you this is what I've done.
And the goal ultimate goal there is that those you marry those who marry those somehow the problem with using clear form tax to do that, that hasn't been curated, that people haven't agreed on that haven't looked at is that it could do it on any arbitrary thing. It could be mistaken.
It could be something that isn't what they meant. But if you do the due diligence of marrying those with specific codes, like, ONET occupation codes or skills that you've already curated very carefully, you can generate those taxonomy attributes before the resume. And you can do that for this job [requisition].
And then you're just marrying skills to skills, you're not marrying text descriptions, descriptions, you're marrying those actual labels that you've come up with about that position, to labels about that candidate. And that's a much more fair way of doing it.
The challenge also is is because if I mentioned computers, 15 times and my resume, and someone else mentions computer only once in their resume, does that mean I'm better than computers? No, it doesn't. So you always kind of have to do your due diligence. Still today, you still have to figure out what does that context really mean. Now? Is AI getting smart enough to do some of that? Yeah, it is.
But I don't think that it would be smarter without the humans reviewing that.
So I think that as we go along, we always treat machine learning and AI as assistive technology. And that decisive in that sense, so that you get the right answers, and you grow with the machine, not against it or something like that.
And what about accommodating future skill demands? You know, a lot of organizations are planning for the future talent needs an overall workforce planning trajectory. So how might this taxonomy possess the agility to adapt and learn in order to stay ahead of upcoming skill trends. And on that note, I'm just curious, if you could name like, what are the tools that you're using in order to build this as well?
Brian Ray:
Yeah, yeah, sure. First off is the first concept you're talking about, we call that drift, concept drift, or, you know, skill drift in this case. And what drift is, is essentially where the outside world changes, based off of the process. It changes, despite the process you've already put in place already.
So when you put that process in place, you've validated that those skills are relevant, and everything's working great, you can take some data, you can prove that it's working out, you can just, you know, you can look at it again and say, Okay, this works. This is working today, as I know it.
Now, the process didn't break what changed was outside of this process, the world outside changed, right? So now there's these new skills. And it no longer does this perfect marriage anymore because these skills are unknown, - it's unseen data, and a regular review process.
Now, if you're doing it automated with like machine learning, and things like that in AI, then there's a retraining process that's very explicit that you go through.
And most practitioners are trained on that. And if it's a manual process, you have to do the same thing, you have to do a review and update process as frequently as needed. How frequently do you know that it's needed, you can do some sampling along the way, and see if drift starts to occur.
And you as you can tell, it just starts to curve, because you can measure your, okay, how this how would this work with this new set of data, and then you see it drifting in the wrong direction, eventually, you say, Okay, I'm gonna rebuild this thinking, or this model. And I'm gonna reapply it to the new world. And then you go, and you rinse and repeat.
So it doesn't ever - the review process never goes away. Now, I mentioned before human in the loop, you build that into your process, then some of that does help. It helps when you're constantly reviewing good results, as well as bad results, saying, Well, I see that you identified this person has the skill, but we found that that's untrue. Add that to my database of things.
Tool wise, it really depends on the sophistication of the organization, where if I an analyst, and I just get out of college, and I, I want to I picked up some Python in college, and I want to use some language processing tools off the shelf, I could use LDA and LDAvis, and NLP and NLTK.
And I could, I could use some very basic tools to do some general analysis. But that changes drastically when you're talking about, in my case, I'm 100,000 people organization, and you need really solid, sophisticated tools that can be automated and updated.
And that's when you start getting into cloud technology like Google Cloud Platform, and AWS and Microsoft technology are some of the vendors and even open source tools as well.
And then when you get into the Gen AI Space, which is fairly new, which could be useful here, by the way, I can classify resumes better with Gen AI, which was very relevant here and I can also interact with candidates differently that I you know, I'm talking about the large language models that have been released recently, and there's various flavors of those, some are proprietary, most are and then some are open source.
It's complicated to keep up with, it's fast moving, but the value proposition is there. So you can interact differently with candidates and also classify documents, more accurately with less data, and things like that.
So there is a value proposition to that tool set, but it comes with a cost to get up to speed on it also, a cost as you run it. The values there- so many businesses, I have to gather around that quite a bit and figure out if it's worth it or not some cases, so they have to pick the right use cases, they also have to look at the risk involved.
So there's risk involved with some of these processes, the predictions, right, so any prediction has risks.
They could be wrong. But you got to figure out how that's going to impact this type of thing. And the last thing I should mention is regulation could likely change too. Regulations already changed in Europe around the space. And it's changing very quickly here in the US as well.
Felicia Shakiba: I could only imagine where we will be a year from now regarding regulations. But I think it's still exciting to take the journey.
And so there's another thought I have, which is once the taxonomy is built, and it serves as an identification tool to fill current roles or even future roles within the organization, on that line of thinking, would this AI taxonomy, personalized skill development recommendations? And would it align with personal and organizational objectives?
Brian Ray (23:34): Yes, I love this topic. So when we were looking, we did some work for some state departments, and we were looking at helping find jobs.
Now in theory, exactly that I can see where there's a gap pretty easily. Now, if I'm really smart, now, I will really want to help the population find jobs, I would do just that I would say, Okay, well, I'm seeing a skill gap here, I see this person applied for this job. Let me create another aspect of this taxonomy that has more to do with recommended trainings on how to get there, how do you build that skill?
Should I go back to school? Should I take a course? Should I work on my speaking skills on my own? Or do I just need to elaborate more on my next rendition of my resume?
If I found a potential experience that you could elaborate on - and maybe that'll fill that gap too? But absolutely. So there is a really a helpful aspect of understanding, statistically, what's missing there. That's real helpful.
So like, I would like to know that for myself too like, what am I missing in my training and development plan to get to- I just recently take on new rule of evidence that's global and pretty wide, sweeping... Well, please tell me what skills do I need to work on a do my job better? You know, I would do it in a heartbeat.
It's hard to figure out sometimes when if you're not getting any recommendations or assistance on that. And that machine can do it for me, I would probably take it up, I would take up on it, I would say okay, well, if he points me to a specific course, and only takes a few hours to especially, you know, and I can go online and take that for free.
Even better yet, you know. So I think you're onto something here, I think it really allows you to understand what next steps should be taken. In fact, that's one of the models that we develop sometimes as next best actions NBA, not the basketball, but next best actions allow you to help predict what are the next actions a person should take?
And there's no reason that can't be someone looking for a job.
That's awesome. I guess that's the future.
Brian Ray: Yeah, absolutely. Well, it's, it's really, it's really helpful in that sense, too. Because then you're also getting to see more honestly and transparently, potentially your competition in the workspace, right?
What is the person who didn't achieve this job? What did they do differently? People keep asking, well, why can't I get this job? Why can't I be promoted even if I have a job? Why can't I get a raise? You know, people keep asking these questions. And to have some logical statistical basis that gives them a reasonable answer is extremely helpful to alleviating that frustration, and to do something about it and to go, you know, address it personally.
Or organizationally, if I noticed that there's a lot of folks that are missing the skills needed to build those capabilities to develop that, I could survey that. I could figure out, you know, through a learning and development process, what people are lacking. I could start treating that as an initiative saying that we're doing that right now. We're saying that we want to have 25,000 Eviden employees very knowledgeable about Gen AI by Q2. And how?
So how do you do that? Well, you got to know where you're at today and you got to know what's available, and you got- it's a bunch of tricky stuff to do that. But I believe a lot of organizations are in the same boat we are.
Felicia Shakiba: Brian this was very insightful I'm so glad we have you on the show. Thank you so much for your time.
Brian Ray: You bet, thanks for having me. I'm looking forward to hearing it.
Felicia Shakiba: That's Brian Ray, Global Head of Data Science and Artificial Intelligence at Eviden.