Exploring MNIST with TensorFlow, from the very beginning, to an Android Application

Hi everyone,

I’ve recently started to study TensorFlow, and is such a great framework and TensorBoard is amazing.

But I’m not here to say how much is awesome (you know it), I’m here to talk about some of the things I’ve been doing with TensorFlow.

Basically, the HelloWorld of TensorFlow is to build a model to classify MNIST digits, so I created a repository with a lot of links to cool MNIST tutorials, some more simple and direct, others more complex. And I also created a repository for a handwritten digits classifier on Android and how to build one on your own.

demo

Hope it helps somehow, have a nice day!

Exploring MNIST with TensorFlow, from the very beginning, to an Android Application

About Outreachy

Hi everyone,

I’ll be quick in this post, I’ll just talk a little bit about what changed since the last post about Outreachy.

So:

  1. I finished the implementation ( \o/ yeeey), but the code wasn’t merged yet ( :/ not so yeeey), and this was mainly because of the new stable version (Ocata) that started being created at the end of January, so this means no new feature could be merged, but the frozen time has ended, the winter is not coming, so this means the code should be merged soon.
  2. I’ve blogged a little about Mocking, which was a new concept I had to learn in order to make good unit tests for my patch.
  3. I tested my code manually:
    1. ran Hadoop, Spark and Storm jobs with multiple scenarios fixing some bus
  4. And currently, I’m reviewing code and making other contributions to the community (adding code, tests, fixing bugs)!
About Outreachy

A fast introduction to Mocking with Python

Hey there,

This post goal is to talk a little about Mocking, and how to do that (hopefully well) in Python. These are the main topics of this article, have fun!

  1. Are you mocking me, what is mock?
  2. Mocking use cases
  3. How to mock in Python?
  4. References

Are you mocking me, what is mock?

 

Mocking is basically to abstract some part of your code (function call, variable, API request, system call, …) and instead of calling the real thing, just pretend that it is there and it works correctly.

Why do you want this?

Mocking is mainly used in unit testing, why do we run unit tests? To be more confident that some particular and well-defined part of you code is working as it should. But what if part A of the code, that you’re interested in testing with unit tests, has a dependency with complex part B, you definitely don’t want the unit test testing the complex part B but mainly the part A, and complex part B will have its own unit tests, right? So how can we abstract complex part B? That’s right, Mocking it!!!

“In short, mocking is creating objects that simulate the behavior of real objects.” [2]

Mocking use cases

 

There’s a lot of valid use cases for Mocking, all of the ones I’ll say are related to unit tests because it is the mainly general use case for Mocking.

  • As I said in the last section, a good reason to mock a part of your code is in order to maintain unit tests isolated and concise. It’s not a good thing to have a unit test that tests a lot of things at once. You’re probably will prefer to break this big unit test in smaller ones, and mock can offer a cool way to do it!
  • Sometimes part of the code you want to test is just not working as it should (not ready yet, unstable, for some reason not available, failing), mocking in this case, is a good thing, because you can test the already stable part of the code.
  • To actually run some part of the code can be too costly or have some side effect not desired (API requests, create files, writing in files, system calls), so mocking can really help you test the code without worrying with these.
  • Test unpredictable code, good unit tests are always executed expecting the same values, random values can be mocked to avoid randomness in the unit tests.
  • Mock a certain complex object that you don’t really need to have it entirely, you can mock only the parts that interest you in the context.
  • And many more cases….

Important: Mocking is very powerful, but do it “the right way” is not that simple and involves practice and experience, a lot Mocking (especially bad mocking) can actually make your unit tests less effective.

How to mock in Python?

> Using the mock package [5].

Whatcan you do with the mock package?

 

  • Pretend to be anything you want!
import mock

anything = mock.Mock(return_value=3)
print anything() # 3

“Mock is a flexible mock object intended to replace the use of stubs and test doubles throughout your code”. The Mock Class can be used to repeat basically anything in the code, in the example above it represents s function that returns 3. Other useful things you can define for the Mock objects are:

  • “side_effect: A function to be called whenever the Mock is called. See theside_effect attribute. Useful for raising exceptions or dynamically changing return values. If side_effect is an iterable then each call to the mock will return the next value from the iterable.”

  • “return_value: The value returned when the mock is called. By default this is a new Mock (created on first access). “

  • “name: If the mock has a name then it will be used in the repr of the mock. This can be useful for debugging. The name is propagated to child mocks.”

An alternative for Mock Class is MagicMock, “MagicMock is a subclass of Mock with default implementations of most of the magic methods. You can use MagicMock without having to configure the magic methods yourself.”

  • Pretend to be a particular thing (object, function, call, …)
from mock import create_autospec

def any_function(a, b, c):
 pass

mock_function = create_autospec(any_function, return_value='unicorn')
print mock_function(1, 2, 3)
# returns 'unicorn'
mock_function.assert_called_once_with(1, 2, 3)
print mock_function('wrong arguments')
# error

“The mock.create_autospec method creates a functionally equivalent instance to the provided class. What this means, practically speaking, is that when the returned instance is interacted with, it will raise exceptions if used in illegal ways.”

from mock import patch

class SomeClass(object):
 def __init__(self):
 pass

@patch('__main__.SomeClass')
def function(mock_class):
 return mock_class is SomeClass

print function()

patch() as function decorator, creating the mock for you and passing it into the decorated function”. patch() will replace the class (or method) by a magicMock

 

References

 

[1] https://www.toptal.com/python/an-introduction-to-mocking-in-python

[2] http://stackoverflow.com/questions/2665812/what-is-mocking

[3] http://www.voidspace.org.uk/python/mock/mock.html

[4] https://docs.python.org/3/library/unittest.mock.html

[5] http://blog.thedigitalcatonline.com/blog/2016/03/06/python-mocks-a-gentle-introduction-part-1/#.WH9emXUrKkB

 

 

 

A fast introduction to Mocking with Python

Hey, How is Outreachy going?

Hey there!

First of all sorry for the absent time without writing on the blog (especially not writing about Outreachy), but anyway… Happy 2017, and we’re back on track!

So, the last Outreachy post was about how to run a MapReduce job using DevStack and different types of data sources as input and output. In this article, I’m gonna talk a little about the things I’ve done until now in this internship and what I’m currently doing.

As I said before,  my internship project is to refactor some data source/job binary related code in order to create a clean abstraction that will allow new developers to implement new data source types and job binary types more easily. So, how are things until now?

  1. My first task was to learn a little more about stevedore and how Sahara uses it
  2. Then I’ve set up the DevStack environment in order to run MapReduce jobs and to learn how to do that with multiple types of data source (just in order to understand how the final user does it)
  3. One of the major tasks I had to do was to create the abstraction itself, it demanded a lot of coding reading and grepping code all over the Sahara repository. The result of this task was the creation of the spec and blueprint of the project. These are still being improved accordingly with task 4.
  4. Develop the abstraction and the data source/job binary types implementation, and of course, refactor the code in order to adapt for these abstractions. This is what I’m currently working on and the code can be seen here.

I hope this post was helpful somehow, and have a nice day!

Hey, How is Outreachy going?

Running Jobs with DevStack and OpenStack – Tutorial

UPDATED: I added spark jobs and storm jobs in this tutorial, hope it helps!

This is a very practical tutorial about how to run MapReduce jobs, Spark jobs and Storm jobs with multiple types of data sources (Manila, HDFS and Swift) using Devstack or OpenStack.

If you have an OpenStack cloud, just ignore all the DevStack related worries (like enabling connection and setting up DevStack).

This post considers that you have some familiarity with OpenStack, DevStack and Sahara. If you don’t feel that you have this familiarity don’t worry, this should help you as well and you can look in some references at the end of this post :)!

For this tutorial was used a VM with Ubuntu 14.04,  12 GB RAM, 80 GB HD, 8 vCPU and the DevStack master branch of the day 12/16/2016. Also, the plugins used were Hadoop 2.7.1 (this tutorial can be easily adapted for any Hadoop version below it), Spark 1.6.0 and Storm 0.9.2.

Main sessions of this post:

  • SETUP DEVSTACK
  • ENABLE COMMUNICATION BETWEEN DEVSTACK AND INSTANCES
  • RUN A MAPREDUCE JOB
  • RUN A SPARK JOB
  • RUN A STORM JOB

Setup DevStack

First of all, be sure to start DevStack in a VM instead of a real machine.

  1. SSH to the VM
  2. clone DevStack
    1. git clone https://git.openstack.org/openstack-dev/devstack
    2. cd devstack
  3. create a local.conf file in the DevStack folder:
    here is a local.conf example, it enables Manila, Sahara and Heat that are projects we’re going to need in this tutorial. There are a lot of templates of how this file should be, one of them is present in [4] that is a local.conf with Sahara.
  4. ./stack.sh
  5. Go get some lunch or something, it’s going to take some time

If you have an error like this:
“E: Unable to locate package liberasurecode-dev
./stack.sh: line 501: generate-subunit: command not found
https://bugs.launchpad.net/devstack/+bug/1547379”

The solution is:
“This is not a bug actually, it happens if 
devstack unable to install some packages like in my case it was enabled to get liberasurecode-dev which led into that issue, you to edit /etc/apt/source.list to enable install from trusty-backport (if ubuntu server) or xyz backports in case of other server that fixed the issue for me.”

Probably you will not have any problems here, just be sure that local.conf file is right and the stack should work just fine. When the stack.sh finishes it will show a web address you can go to and see Horizon [5] which you can login using the user and password that you specified at local.conf.

Enable communication between DevStack and instances

Okay, now we need to create a cluster so we can run a job on it, right? But before we gotta make sure our DevStack instance can communicate to the cluster’s instances.

We can do this with the following commands in the DevStack VM:

sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
sudo route add -net [private_network] gw [openstack_route]

You can get the [private_network] and [openstack_route] trough Horizon on the Network menu or trough the API as well.

Security groups

Also, an important thing to do is to add SSH, TCP, ICMP rules to the default security group and make sure that all the instances in the clusters belong to this group. If you want to you can create another security group, it’s fine, as long as you add the needed rules and make sure that the instances belong to this group.

You can do this easily through Horizon or API [6].

Run a MapReduce Job

Create a Hadoop cluster

In order to create a Hadoop cluster you have to follow some previous steps, these are listed and described below. You can get some more info about how to do this at [7].

Register Image

In this tutorial, we’re going to use Hadoop 2.7.1 and this image.

  1. Download Image
    1. wget http://sahara-files.mirantis.com/images/upstream/newton/sahara-newton-vanilla-2.7.1-ubuntu.qcow2
  2. Create Image with glance
    1. openstack image create sahara-newton-vanilla-2.7.1-ubuntu \
      –disk-format qcow2 \
      –container-format bare \
      –file sahara-newton-vanilla-2.7.1-ubuntu.qcow2
  3. Register Image
    1. openstack dataprocessing image register sahara-newton-vanilla-2.7.1-ubuntu \ –username ubuntu
      Important: the username must be ubuntu
  4. Add hadoop (vanilla) 2.7.1 tag
    1. openstack dataprocessing image tags add sahara-newton-vanilla-2.7.1-ubuntu \ –tags vanilla 2.7.1

Create node groups

I usually do this part through Horizon, so what you I do is basically create two node groups: a master and a worker. Be sure to create both of them in the default security group instead of the default option of Sahara which is creating a new security group for the instance.

Master:  plugin: Vanilla 2.7.1, Flavor m1.medium (feel free to change this and please do if you don’t have enough memory or HD), available on nova, no floating IP

  • namenode
  • secondarynamenode
  • resourcemanager
  • historyserver
  • oozie
  • hiveserver

Worker:  plugin: Vanilla 2.7.1, Flavor m1.small (feel free to change this and please do if you don’t have enough memory or HD), available on nova, no floating IP

  • datanode
  • nodemanager

Create a cluster template

Very straightforward: just add a master and at least one worker to it and mark the Auto-configure option.

Launch cluster

Now you can launch the cluster and it should work just fine :), if any problem happens just give a look at the logs, and make sure that the instances can communicate to each other through SSH.

Running a Hadoop job manually

If you want to “test” the cluster you can run a Hadoop job manually, the process is described below and is totally optional. Below we’re running a wordcount job.

  1. SSH to master instance
  2. login as hadoop
    1. sudo su hadoop
  3. Create a hdfs input
    1. bin/hdfs dfs -mkdir -p [input_path]
  4. Add some file to it
    1. bin/hdfs dfs -copyFromLocal [input_path] [path_to_some_file]
  5.  Run job:
    1. bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount [input_path] [output_path]
      Imporant: [output_path] shoud not exist in HDFS !!!
  6. Get output
    1. bin/hdfs dfs -get [output_path] output
    2. cat output/*
  7. If you don’t need the HDFS output anymore delete it!
    1. bin/hdfs dfs -rm -r [output_path]

Run MapReduce job

FINALLY! So, we’ll use this file as an example and we’ll run the WordCount job. In order to keep things simpler we’ll let the job binary type as a job binary internal, so we’ll just change the data source, although if you want to change the job binary type the process is pretty much the same, jut create it somewhere (Swift, HDFS or Manila) and define its URL.

Sahara give you the addresses below, one thing that may be needed is to add some rules in the security group in order to allow access to these pages, which can be extremely helpful for debugging and for logging access:

Web UI: http://%5Bmaster_ip%5D:50070
Oozie: http://%5Bmaster_ip%5D:11000
Job Hist: http://%5Bmaster_ip%5D:19888
YARN: http://%5Bmaster_ip%5D:8088
Resource Manager: http://%5Bmaster_ip%5D:8032

The API commands and details that I don’t show in this session can be seen at [3].

Create a job binary

  1. Just download the file in the link above
  2. Create a job binary internal using the file, also make sure the name of the job binary has the .jar/.pig/.type_of_file,etc, in this case: hadoop-mapreduce-examples-2.7.1.jar

If you want to use Swift, Manila or HDFS for the job binary there’s no problem.

Create a job template

Just choose a name for the job template, use the type MapReduce, and use the job binary that we created.

Create a data source

HDFS

  1. SSH to master instance
  2. login as hadoop
    1. sudo su hadoop
  3. Create a hdfs input
    1. bin/hdfs dfs -mkdir -p [input_path]
  4. Add some file to it
    1. bin/hdfs dfs -copyFromLocal [input_path] [some_file]
  5. Create data source for input
    1. select HDFS as type
    2. URL: [input_path]
  6. Create data source for output
    1. select HDFS as type
    2. URL: [output_path_that_does_not_exists]

The URL considers that the path is at /users/hadoop if it isn’t please provide the whole path, also if you’re using an external HDFS provide the URL as:  hdfs://[master_ip]:8020/[path] and make sure you can access it.

SWIFT

  1. Create a container
    1. this can be done through Horizon or API
  2. Add an input file in the container
    1. this can be done through Horizon or API
  3. Create data source for input
    1. select Swift as type
    2. URL: [container]/[input_path]
    3. user: [your user], in this case “admin”
    4. password: [your password], in this case “nova”
  4. Create data source for output
    1. select Swift as type
    2. URL: [container]/[output_path_that_does_not_exists]
    3. user:[your user], in this case “admin”
    4. password:[your password], in this case “nova”

MANILA

  1. If is the first time you’re creating a share
    1. create the default type
      1. manila type-create default_share_type True
    2. create a share network
      1. manila share-network-create \
        –name test_share_network \
        –neutron-net-id %id_of_neutron_network% \
        –neutron-subnet-id %id_of_network_subnet%
        I used the private net for the shares and the cluster :).
  2. Create a share
    1. manila create NFS 1 –name testshare –share-network [name_of_network]
  3. Make this share accessible
    1. manila access-allow testshare ip 0.0.0.0/0 –access-level rw/
      Important: you can change the ip for the master’s IP and workers’s IP as well, is actually more recommended
  4. SSH to some instance that has access to the share
    1. sudo apt-get install nfs-common
    2. Mount the share
      1. sudo mount -t nfs /mnt
    3. Add an input to it
      1. cd /mnt
      2. mkdir [input_path]
      3. cp [input_path] [some_file]
    4. Unmount the share
      1. sudo umount -f /mnt
  5. Create data source for input
    1. select Manila as type
    2. URL: /[input_path]
  6. Create data source for output
    1. select Manila as type
    2. URL: /[output_path_that_does_not_exists]

Run job

  1. Choose the job template we’ve created as a job template
  2. Choose an input data source
  3. Choose an output data source
  4. Configure job correctly
    1. For the job we’re running the minimum conf. needed is:
      1. mapreduce.reduce.class = org.apache.hadoop.examples.WordCount$IntSumReducer
      2. mapreduce.map.output.value.class = org.apache.hadoop.io.IntWritable
      3. mapreduce.map.class = org.apache.hadoop.examples.WordCount$TokenizerMapper
      4. mapreduce.map.output.key.class = org.apache.hadoop.io.Text
      5. mapred.reducer.new-api = true
      6. mapred.mapper.new-api = true
        Important: we need these last configurations (5,6) because we’re using new Hadoop API. and the other conf. are self-explanatory.
    2. If you’re running a Hadoop 1.2.1 job, you would need to configure
      1. mapred.mapoutput.key.class
      2. mapred.mapoutput.value.class
      3. mapred.reducer.class
      4. mapred.mapper.class

Problems? Check the web UI logs it can be very helpful!

Run a Spark job

IMPORTANT: from this point I’ll not detail how to create data sources, because is exactly the same procedure showed for Hadoop.

Create a Spark cluster

Register Image

In this tutorial, we’re going to use Spark 1.6.0 and this image.

  1. Download Image
    1. wget http://sahara-files.mirantis.com/images/upstream/mitaka/sahara-mitaka-spark-1.6.0-ubuntu.qcow2
  2. Create Image with glance
    1. openstack image create sahara-mitaka-spark-1.6.0-ubuntu \
      –disk-format qcow2 \
      –container-format bare \
      –file sahara-mitaka-spark-1.6.0-ubuntu.qcow2
  3. Register Image
    1. openstack dataprocessing image register sahara-mitaka-spark-1.6.0-ubuntu \ –username ubuntu
      Important: the username must be ubuntu
  4. Add hadoop (vanilla) 2.7.1 tag
    1. openstack dataprocessing image tags addsahara-mitaka-spark-1.6.0-ubuntu \ –tags spark 1.6.0

Create node groups

In this tutorial, I’ll just make one node group that has the master and worker, I’ll call it All-in-one.

All-in-one:  plugin: Spark 1.6.0, Flavor m1.medium (feel free to change this and please do if you don’t have enough memory or HD), available on nova, no floating IP

  • namenode
  • datanode
  • master
  • slave

Create a cluster template

Very straightforward: just add the node group All-in-one and mark the Auto-configure option.

Launch cluster

Now you can launch the cluster and it should work just fine :), if any problem happens just give a look at the logs, and make sure that the instances can communicate to each other through SSH.

Run a Spark job

For this part, we’ll use this job binary, this is a wordcount job, create the job binary exactly how you created the Hadoop job binary, and choose this job binary as the main for the job template. 

At this point you’re basically good to go! You’ll need input and output data sources, you can do exactly how we did with Hadoop.

Run job

  1. Choose the job template we’ve created as a job template
  2. Configure job correctly
    1. main class
      For this job the main class is: sahara.edp.spark.SparkWordCount
    2. configsFor Swift type you’ll may need to pass the credentials as configs, example:
      • fs.swift.service.sahara.password = admin
      • fs.swift.service.sahara.username = nova
    3. args
      Now, the biggest difference between Spark and Hadoop, is that the data sources are passed as args. So to run with a general data source you should pass as args:

      • datasource://[name of the data source]
      • datasource://[name of the data source]

       

    4. Run!

Run a Storm job

Create a Storm cluster

Register Image

In this tutorial, we’re going to use Storm 0.9.2 and unfortunately there’s no image available, but don’t be sad! Sahara-image-elements makes easy to create an image for Storm and other plugins, just follow the instructions, generate your image and then come back here!

  1. Generate image with sahara-image-element
    1. it probably will generate something like: ubuntu_sahara_storm_latest_0.9.2
  2. Create Image with glance
    1. openstack image create ubuntu_sahara_storm_latest_0.9.2 \
      –disk-format qcow2 \
      –container-format bare \
      –file ubuntu_sahara_storm_latest_0.9.2
  3. Register Image
    1. openstack dataprocessing image register ubuntu_sahara_storm_latest_0.9.2 \ –username ubuntu
      Important: the username must be ubuntu
  4. Add hadoop (vanilla) 2.7.1 tag
    1. openstack dataprocessing image tags add subuntu_sahara_storm_latest_0.9.2 \ –tags storm 0.9.2

Create node groups

In this tutorial, I’ll create a master and a worker, for some reasons Storm fails to have both components at the same node.

master:  plugin: Strom 0.9.2, Flavor m1.medium (feel free to change this and please do if you don’t have enough memory or HD), available on nova, no floating IP

  • zookeeper
  • nimbus

master:  plugin: Strom 0.9.2, Flavor m1.small (feel free to change this and please do if you don’t have enough memory or HD), available on nova, no floating IP

  • supervisor

Create a cluster template

Very straightforward: just add a master and at least one worker to it and mark the Auto-configure option.

Launch cluster

Now you can launch the cluster and it should work just fine :), if any problem happens just give a look at the logs, and make sure that the instances can communicate to each other through SSH.

Run Storm job

For this part, we’ll use this job binary, this is one of the examples from Storm-examples called ExclamationTopology and it doesn’t have any real use, but is a good job binary test! Create the job binary exactly how you created the Hadoop job binary, and choose this job binary as the main for the job template. 

At this point you’re basically good to go! Storm doesn’t need data sources!

Run job

  1. Choose the job template we’ve created as a job template
  2. Configure job correctly
      1. main class
        For this job the main class is: storm.starter.ExclamationTopology
  3. That’s it! Run!
    1. It will literally run forever (if you want to stop it, just kill it: using Storm or Sahara)

 

References

Running Jobs with DevStack and OpenStack – Tutorial

what is stevedore? and how is it related to Sahara?

Before we start…

This post is considering that:

  1. You know Sahara :)!
  2. You don’t know stevedore :(, or you know it, but feel like you should know more about it or how it relates to Sahara

So If you don’t know Sahara, I can introduce you two, here is Sahara. I’ll let you know each other.

First, what is stevedore?

“Python makes loading code dynamically easy, allowing you to configure and extend your application by discovering and loading extensions (“plugins”) at runtime. Many applications implement their own library for doing this, using __import__ or importlib. stevedore avoids creating yet another extension mechanism by building on top of setuptools entry points. The code for managing entry points tends to be repetitive, though, so stevedore provides manager classes for implementing common patterns for using dynamically loaded extensions.”

If you’re not familiar with some concepts like setuptools, entry points, and how these concepts could be used to make the process of loading code dynamically easier, this link can be very helpful.

So now we kind of know what is stevedore: “stevedore manage dynamic plugins for Python applications”. So stevedore is “independent” of OpenStack, it’s used by a lot of projects of OpenStack, but any Python code can use it :)!

Okay… but what exactly are Plugins and why should I use them?

Plugins are software components that add features to an existing computer program (core code). “When a program supports plug-ins, it enables customization.

In other words, a plugin is a piece of code that is not a part of the core code, and because of this fact it can be added or removed easily. And it provides some features, services and operations in a more specific away.

And we should use plugins because:

  • We already talked about customization, that means that using plugins you can have “different versions” of the code in an easier way. If I don’t need all the plugins I can install just the plugins and dependencies I need.
  • We will have an improved design. “Keeping a separation between core and extension code encourages you to think more about abstractions in your design”.
  • “Plugins are a good way to implement device drivers and other versions of the Strategy pattern. The application can maintain generic core logic, and the plugin can handle the details for interfacing with an outside system or device.”
  • “Plugins also provide a convenient way to extend the feature set of an application by hooking new code into well-defined extension points. And having such an extensible system makes it easier for other developers to contribute to your project indirectly by providing add-on packages that are released separately.” 

And how Sahara uses stevedore?

Let’s open Sahara source code and have a look, here is the link: https://github.com/openstack/sahara

As we discussed plugins are loaded through entry points, the configuration can be seen in a file called setup.cfg, search for [entry_points], the syntax is:

namespace = name = module.path: importable_from_module

By the name of the namespaces we can suppose that exists a driver responsible for SSH, console scripts are implemented with stevedore, cluster’s plugins, a Heat engine and some other things.

Feel free to explore these namespaces on your own and see how they work! I may add more details about this with the time :)!

References

http://docs.openstack.org/developer/stevedore/
http://docs.pylonsproject.org/projects/pylons-webframework/en/latest/advanced_pylons/entry_points_and_plugins.html
https://www.youtube.com/watch?v=U53ND5NucYY

what is stevedore? and how is it related to Sahara?

Welcome to Outreachy!

Hi everyone,

In this post I gonna talk about the Outreachy project, some tips that may be help to get accepted in this program, and a little about my project this winter with OpenStack.

Hope this can be helpful! Enjoy!

What is Outreachy?

“Outreachy helps people from groups underrepresented in free and open source software get involved. We provide a supportive community for beginning to contribute any time throughout the year and offer focused internship opportunities twice a year with a number of free software organizations.”

So, Outreachy project in practice is very similar to the very popular GSoC program, basically students apply submissions in order to possible work with some open source software for ~3 months and the process and timeline are very similar, you can get more details about it in the first link of this post. But one important thing to notice are the differences between these two projects:

  • Well, the main difference is that Outreachy project is currently only to women (cis and trans), trans men, and genderqueer people.
  • Outreachy has an application template (independent of the project that you’re applying for), which I particularly think is very cool, because you know that you’re giving them the exact information they need to know if you’re able to participate 🙂
  • Outreachy also offers projects involving documentation, design, research projects along with coding projects – unlike Google Summer of Code. Which I had no idea actually, so thanks bee2502 for the correction and the information :)!
  • If you want to apply to Outreachy and work directly with code you’ll need to have some merged code at the open source you’re applying for, but calm down, it can be an easy bug or a documentation issue, basically anything merged. Which I think it makes a lot of sense and makes appliers get to know a little more of the project they’re applying to work with! At GSoC you don’t have to do this. Don’t worry so much about it, is very likely that the community will be very pleased to help you with this, and you can talk to your possible mentor to verify if she/he can help you somehow!
  • Outreachy happens twice a year: winter and summer :)! So maybe if summer doesn’t work out for you winter may. This happened to me, I wasn’t accepted, then applied at winter and got accepted, so… dream on!

As I said you can get more info at the first link of this post, and if you interested in applying for Outreachy, please have a look on it!

Some valuable tips

Here are some tips and things I learned with my experience.
About my experience: I applied for Outreachy summer 2016 and Gsoc 2016 and unfortunately didn’t get to work with the projects, and I applied again for Outreachy winter and I was chosen.

Some things I learned with these experiences were:

  • Writing application, contacting mentors, getting to know the community: TAKES TIME! So don’t do it in the last week and just hope everything works out, it makes a lot harder. And know that those things will for sure take some time. 

    TIP: inform yourself about the schedule, and always try to do things like: writing your submission, applying your submission, talking to your mentor, knowing the community as fast as possible.Don’t understand me wrong, don’t do things rough-and-ready, what I’m saying is if you know that you want project X show interest in it, inform yourself about it as soon as you can, don’t wait some weeks if you can get things done now, because for sure starting sooner will improve your chances to get it! Also the sooner you apply the sooner someone can give you a feedback about your submission!

  • Applying to a lot of projects is a hard thing to do, and harder to do it right. 

    TIP: Choose a target, don’t just write a lot of submissions because you can. This is a common thing to do: “I can apply for multiple projects, so why not? it will increase my chances :)”, but I’ll try to convince you that this is not the best thing to do. Yes, you can apply for multiple projects of multiple organizations, but it TAKES TIME, so instead of making 2 or 3 good submissions, try to work on one awesome submission! You’ll have more time to get to know the community, to really understand your project and to write a good submission.

  • TIP: Talk to your possible mentor, don’t just assume she/he is always checking your submission or code contribution, get up, show some work! And of course, understand that mentors have some others obligations so don’t always expect quick answers and show some empathy.
  • FINAL TIP: Don’t give up if you really want a project, just keep trying eventually some result will come from it, if not Outreachy some other sort of good result will, this I know for sure!

My Outreachy project Winter 2016

So, as I said I was accepted for Outreachy this winter, and I’m very grateful for this. I’m working with OpenStack, more specifically with Sahara.

There are a lot of definitions of OpenStack, so to be sure you’ll get the spirit of it, some definitions/introductions are:

“OpenStack is a set of software tools for building and managing cloud computing platforms for public and private clouds”

“OpenStack lets users deploy virtual machines and other instances that handle different tasks for managing a cloud environment on the fly.”
“OpenStack is a free and open-source software platform for cloud computing, mostly deployed as an infrastructure-as-a-service (IaaS). The software platform consists of interrelated components that control diverse, multi-vendor hardware pools of processing, storage, and networking resources throughout a data center.”

Of course OpenStack is a big project and can’t be fully described in just a few lines, You can get more details about it at the second link of this post.

If you search a little bit about OpenStack, you’ll see that OpenStack is actually composed of a bunch of smaller projects, like Neutron, Nova, Keystone, and so on… One of these projects is Sahara! “The sahara project aims to provide users with a simple means to provision data processing frameworks (such as Hadoop, Spark and Storm) on OpenStack.”.

In other words, Sahara aims to offer a fast and easy way to create clusters and manage them, run jobs (Hadoop, Spark, Storm) from different kind of data sources. Of course, Sahara offer these all through an infrastructure managed by OpenStack and the user doesn’t have to deal with internal details (unless she/he wants to).

So, given this context, I can talk a little bit about the project I’m working with Outreachy. As I said Sahara supports running jobs from different kinds of data sources, actually Sahara allows remote HDFS, local HDFS, Swift, and Manila as data sources. However, there’s no clean abstraction around them, and the code to deal with them is often very difficult to read and modify. I’m responsible for creating a clean abstraction of a DataSource that each data source type can implement differently depending on its own needs.

Welcome to Outreachy!

URI – Warm up Contest to OBI 2016 -second phase

Hey there,

Not so recently happened the Warm up Contest to OBI 2016 and I was one of the authors again (I was one of the authors in the first phase as well)!

I wrote 1 problem: Rio 2016
I’ll make a fast editorial of this problem and, hopefully, this will help someone :)!

Rio 2016

By: Marianne Linhares

This problem is very up forward, reading the problem carefully is easy to see that we should calculate the distance between the points and then compare it with the time that the match will begin.

Buuuut, be careful with the use of integers, the multiplication may not fit in an integer, so you should use long long int (C++) for the position variables.

A C++ solution can be seen here.

  • Complexity: O(n).
URI – Warm up Contest to OBI 2016 -second phase

My first Chrome Extension – SpotifYoutube

Hello,

Stop the world, there’s something very weeeird happening… after a couple of very busy months I’ve got the time to start (and finish, I know I’m as shocked as you guys) my first chrome extension.

It’s a simple extension that adds a button to YouTube to search the youtube video title you’re watching in Spotify. I’m very proud of this, it’s working properly and youtube is love, spotify is love, this extension could not be different!

button.png

Hope you guys enjoy it!

Source code, Chrome Web Store!

Demo:

My first Chrome Extension – SpotifYoutube