Intro. to Hadoop and How to install Hadoop in Distributed mode – Portuguese

Hi people,

Here is the link for this basic introduction to Hadoop and how to install it in distributed mode, I’m not going to translate it because I don’t want to in english there are some good materials of how to do it… so… xoxo, see ya!

Advertisements
Intro. to Hadoop and How to install Hadoop in Distributed mode – Portuguese

A graph? What is it?

Introduction

 

Hello, in this article we’ll discuss a little bit about graphs. This is one of the main topics that turns a newbie into an intermediate competitor in programming contests (of course in my opinion).

When I didn’t know much about graphs, or when I didn’t know to code some classic graph problems, people talked or asked questions about this topic and I just thought that:

“Graphs are a really hard topic (mainly to code),  that sometimes appear in coding competitions, and when it does I can notice it’s a graph problem, but I don’t know anything about how to implement it.”

After this article (and a bit of practice) I hope you can have a more structured thought about graphs, like:

“Graphs aren’t in general difficult, this is just something new and more advanced, and that’s why it needs previous knowledge and may take some time to get used to the idea. A lot of problems are related to graphs, almost all competitions have some graph problem.  And when you get the idea of the classic graph algorithms, most of them are simple to understand why it works and to code”.

Ps: I tough the sentence was gonna be shorter, but that is it haha.

This article has the following structure:

  • Previous knowledge required
  • Graphs, a definition
  • How to represent a graph in a computational way?
  • Bibliography and suggested readings


Previous Knowledge Required

So, here we go! Before talking about graphs, some important topics to study are:

  1. Recursion
  2. Basic Data Structures (queue, stack, priority queue)
  3. Feel comfortable in your programming language (libraries, syntax)
    PS: I plan to talk about these topics, when I do I’ll put the link here :).

If you never heard about these topics, or feel like you should know more about them, I really suggest you study it. The major troubles with graph implementations are due to not enough experiences in the topics above.

Graphs, a definition

For programming competitions the following definition is “enough” : a graph is a data-structure, a way to organize the data, defined by a set of vertices/nodes and a set of edges that are used to connect 2 vertices.
 
Graph 1
Graph 2
Graph 3
 

Okay… but why use this model? There are many situations where a graph is a great structural model, the most common examples are: roads and avenues, social networks, friend network, dependencies relations.

An example of a problem evolving graphs is the problem Toll from Brazilian Programming Olympics(OBI) of 2002:

John was the fist place at OBI and as a prize he and his family won a trip to South Korea. They rented a car to get to know the country better. The roads(1) are well maintained; all of them are two-way street(2), and two cities may be connected directly by more than one road(3). But in all of them there is a fixed value Toll(4). The father of John doesn’t have a lot of money to spend, so the car trips must be well planed.

Task: Write a program that, with the cities(5) and the roads in the country, and the city where John and his family are, find each city(don’t count the one where they already are) that can be visited, given the restriction that the father of John wishes to pay at most P Tolls(6).

 

Ok, so it’s a graph problem right? And where is the graph? Well, it’s a classic example! The roads are the edges and the cities are the vertices.
Graph represented by roads example

In the problem explanation there are some interesting words and expressions that say a lot to us about how is the graph. How so? Well, there are a lot of graph classifications and types of graphs. And it will interfere in our solution and in the graph representation.

The mainly graph characteristics

Connected graph: there is a path from each vertex to any other.
Unconnected graph: there is one or more vertices that we can’t reach from some other vertex.
Sparse graph: if the number of edges is much less than the number of vertices (if is less of  the square of the number of vertices it can be considered  sparse).
Dense graph: if the number of edges is close to the number of vertices.
Directed graph: the edges have directions. In other words, if there is an edge from A to B (A and B are vertices) it doesn’t implies in the existence of an edge from B to A.
Undirected graph: edges have no direction. In other words, if there is an edge from A to B (A and B are vertices) it implies in the existence of an edge from B to A.
Weighted graph: the edges have a weight associated to them.
Unweighted graph: the edges doesn’t have a weight associated to them. Other way to see it is that all edges have the same weight.

Some important graphs expressions

Path: sequence of vertices such that for each vertex there is only one edge for the next one.
Shortest path: shortest path that connects a vertex to another.
Degree: the degree is a vertex feature, the vertex degree is the number os edges that goes from the vertex to any other vertex.
Loops: an edge that connects a vertex to it self.
Articulation point: It’s a vertex that if removed will disconnect the graph.
neighbours: 2 vertices with an edge between them.
Bridge: edge that if removed will disconnect the graph.
Cycle: path that starts and end at the same vertex.
Component: a connected part of a graph (if a graph is connected then it has only a component).

Types of graph

Tree: a graph with no cycles and connected.
Forest: a graph with no cycles.

Simple graph: a graph undirected, with no loops and no more than one edge between 2 vertices.
Regular graph: every edge of the graph has the same degree.
Bipartite graph: a graph in which is possible to define two groups of vertices such that every edge of the graph connects vertices of different groups.
Complete graph: a simple graph in which for each vertex there is an edge connecting the vertex to every other vertex of the graph. This graph has n(n-1)/2 edges, where n is the number of vertices.I know that’s a looot to take and there is still a looooot more of graph information to know. But don’t worry about it, don’t memorize the types, you will be more familiarized with them over the time, so… Take it easy!But okay, back to the example, let’s try a challenge!! What every indexed expression in the text says about the graph (Come back to the problem explanation)?

 

 

(1) – edges
(2) – undirected graph
(3) – not simple
(4) – unweighted graph

(5) – vertices of the graph
(6) – What is the question? What is the number of vertices which the minimum path from a vertex v is equal or less to P.

How to represent a graph in a computational way?

There are 2 basic ways to do it: adjacency matrix and adjacency list. Each has some benefits and situations where it is more advantageous, in the end will see each in a more objective point of view trying to focus in when to use one instead of the other. But before let’s talk about how is these representations.

We know we must represent vertices and edges. But how to do it?

 

 Adjacency Matrix

An adjacency matrix consists of a matrix N x N, where N is the number or vertices of the graph and the lines represents the vertex from the edge goes, and the column represents the vertex to the edges ends. In a more practical way, the value 1  in  the line i, column is used to represent the existence of the edge from  to j, if such an edge doesn’t exists we use the value 0. If the graph is weighted we use an infinity value to represent the absence and the weight it self to represent the existence of an edge.

 

Graph VS Adjacency Matrix – 1

Adjacency List

 

The adjacency list represents each vertex and it’s connections in a list of vertices. If there is a connections between a vertex A and a vertex B, so in A list there will be B. In other words, we keep a neighbours list of each vertex. If the graph is weighted we also store the weight of the edge.
Graph VS Adjacency List – 1
Graph VS Adjacency List- 2
Comparing: adjacency list VS adjacency matrix
List VS Matrix
Matrix: in general it occupies less in the memory and if the graph is dense it’s the better choice, easy and efficient to verify existence of an edge between 2 vertices.
List: in general it’s more efficient in time, easier and efficient to verify who are the neighbours of some vertex.
Bibliography

 

A graph? What is it?