Tools & Practices

Tipping into the basics of Artificial Intelligence: Lessons from Jonathan Soma

Professor Jonathan Soma provides essential insights into Artificial Intelligence, suitable for any journalist, newcomer or seasoned professional, presenting an optimal entry point for anyone in the tech field.

“It is something that people think of as being very technically advanced and very math heavy, and you have to be a fancy programmer to understand it all, or a computer science person. But it is not true. Anyone can do it”. Jonathan Soma, Knight Chair Professor of Professional Practice in Data Journalism and Director of the LEDE professional program at the University of Columbia, can be described as the person of the ‘middle ground’ for conversations regarding Artificial Intelligence (AI). Polite and mildly enthusiastic for everything that has to do with data journalism, he makes understanding AI a stroll in the sun.

It was especially sunny when we met during iMEdD international Journalism Forum 2023. He had already talked an entire day about this recent technology, having been part of the panel “AI in the newsroom: A debate on AI’s role in the media” and done a workshop for “Making and breaking AI in the newsroom“. In the garden of Piraeus 260, he shares with iMEdD how anyone can start with AI.

Getting past the confusion

Knowing the basics means knowing what you must deal with, and as Soma says, this is not always easy. ‘The more I talk about AI and the more conversations I have with people, the more I come to the conclusion that we all have different definitions of it, even computer science people”. As Soma says, “it is a system where a computer can do ‘fancy stuff'”, like performing a task that requires human intelligence, often described as the evolution of machine learning. But now, the computer can learn on its own. For those seemingly complicated definitions, IBM’s YouTube videos on AI provide a deep dive into basic terms through simple explainers.

With the appearance of ChatGPT at the end of 2022, understanding the mechanics of generative AI became more complicated yet important for the conversation on AI. ChatGPT made the technology accessible to every user, every news organization, every journalist. “Generative AI is when a computer creates things” Soma explains. “Historically, machine learning in AI has produced something small, maybe it’s a few words, or maybe it’s a category, but generative AI will produce things that we generally associate with human creativity”. A little magical, but mostly logistically based, the generative power ranges from outputs like images and videos, to code. But its greatness often derives from its easiness with language.

And this is where the generative AI chatbots like ChatGPT come in.

Jonathan Soma presents the workshop "Artificial Intelligence and Journalism: Making and breaking AI in the newsroom".  He stands between two screens with notes and holds the microphone. He is wearing a plaid shirt. The audience appears in the foreground, out of focus.
Jonathan Soma during the workshop “Making and breaking AI in the newsroom” Photo by: Alex Grymanis

Chat with GPT

“The first step is always starting from ChatGPT because you can have a conversation and go back and forth”. The Financial Times, on their article “Generative AI exists because of the Transformer” for Large Language Models and Transformers -the basic mechanics of the generative chatbot technology — sophisticated text, images and computer code at a level that mimics human ability”. So, starting a conversation with a chatbot can help one understand the way AI tools work, how to interact with them and what information they need. It is a great way to have first contact with the technology and experiment with unlimited for the user options. Try “feeding” it with one of your stories to make a summary, write the lead, or ask about a general subject for it to provide possible new angles.

In February 2023, Reuters wrote that OpenAI’s ChatGPT is the second fastest growing user base. ChatGPT 3.5 is free but the newest update, ChatGPT 4, does have a cost. If wanting to expand one’s horizons, Anthropic’s Claude2 and Google’s Bard are Large Language Models (LLMs) which have similar ways of producing content but have different aspects such as word count, web browsing and data analysis abilities, that may make them suitable for each user.

Twenty years ago, it would have taken an army of interns. Ten years ago, it would have taken you talking to a data scientist to train a custom model. Now it is just you and Google Sheets and ChatGPT and you can do it in an afternoon.

Jonathan Soma, Knight Chair Professor of Professional Practice in Data Journalism at the University of Columbia

Python, Python, Python

When interacting with a computer, the only way to get truly intimate is learning how to code. “For better or worse, Python is the most common programming language for doing data science. It is readable and easy to learn, and as a result, every single tutorial that you will find about how to do AI stuff is in Python”. That is why the professor has created his own little tutorial for those interested in learning the coding language. Python is not (just) for Unicorns is a good learning and exercise tool for journalists of all backgrounds. It may need a couple of repetition times to get the hang of it, but the light tone of the teaching makes the learning experience enjoyable.

Knowing how to use Excel, Google Sheets or other kinds of spreadsheets will help, especially when working with big data. It makes management easier due to its filtering function as well as helping with analysis with formulas and pivot tables.

Professor Jonathan Soma on the panel "Artificial intelligence in the newsroom: A discussion on the role of artificial intelligence in the media". He is speaking into the microphone and is among his panelists, who are facing him. He is wearing a plaid shirt, glasses and, around his neck, a yellow forum badge. The frame is tilted to the right.
Professor Jonathan Soma on the panel “AI in the newsroom: A debate on AI’s role in the media”. Photo: Alex Grymanis

Experimentation time? Classify your documents

Now that basics were covered, it is time to start using AI for projects. “Classification is one of the most traditional types of machine learning, which is related to AI, and I think generative AI works well with it” says the professor. “You have a bunch of documents, pictures, anything. The question is ‘what category does this belong in?’. So you ask AI. Normally, the systems had to be trained to answer these questions by giving them many examples, naming the categories, and doing the classification. But now, with generative AI, because the software has read everything on the Internet and it knows so much already, you can just say ‘Is this legislation about gun control, the environment, immigration?’. And because it knows what gun control and immigration and the environment are, it can put it in those categories for you”. He has already made his case for traditional classification techniques on his website with a step-by-step guide on how to start in programs like scikit-learn, a machine learning library that uses Python to classify data. But copy-pasting documents on ChatGPT (or other chatbots with a greater word count) and prompting the bot, can help you classify them to a given category in an instant.

And this is where your knowledge with spreadsheets comes in handy. Soma seems excited with a new trick of plugging Google Sheets with ChatGPT. “You can have one column that is the name of the bill or just some sort of text or a comment. And then you just make a new column, and you say ‘classify’. Twenty years ago, it would have taken an army of interns to read all the legislation or look at all the images. Ten years ago, it would have taken you talking to a data scientist to train a custom model. Now it is just you and Google Sheets and ChatGPT and you can do it in an afternoon”.

As the world of AI expands and grows so are things this technology will be able to do. But when entering the world of tech, it is important to know the risks and traps that may come your way. “There are many levels to come in to understand. You could be trying to understand it from technical backgrounds. But this does not teach you the downsides of working with the tools, and it does not teach you the practical sense of interacting with them”. Jonathan Soma notices that many, when starting, use chatbots like search engines and ask them for facts. “It hates to say, ‘I do not know’ in the same way that a person hates to say, ‘I do not know’”, he explains and continues: “the best thing you can do is always think of a tool like ChatGPT the same way that you would think of any other source” and, as a journalist, you would not trust everything your source says.

Λογότυπο Άδειας Χρήσης Creative Commons Non Commercial International