![](https://cdn.prod.website-files.com/6667183066b4b3ca735c66e0/6668bc532b9850a2bdc351af_duke.png)
learning_rate=0.1,
n_estimators=2,
learning_rate=0.1,
n_jobs=-1,
colsample_bytree=0.05
)
# training the model
model.fit(X_train.iloc[:,2:], y_train.iloc[:,2:])
# saving model
model_pathname = Path(model_directory_path) / "model.joblib"
print(f"Saving model in {model_pathname}")
joblib.dump(model, model_pathname)
def infer(X_test: pd.DataFrame, model_directory_path: str = "resources") ->
pd.DataFrame:
# loading the model saved by the train function at previous iteration
model = joblib.load(Path(model_directory_path) / "model.joblib")
# creating the predicted label dataframe with correct dates and ids
y_test_predicted = X_test[["date", "id"]].copy()
y_test_predicted["value"] = model.predict(X_test.iloc[:, 2:])
return y_test_predicted
The cross-section forecast problem is a difficult but fascinating challenge in finance that involves predicting the relative performance of a group of investments over time. The problem arises because directly predicting the future price of a single asset is very hard. So instead, we can try to predict how different investments will do compared to each other. This is done by tracking a pool of investment vehicles over time. The pool is usually made up of assets that are obtained according to some rule, like the S&P 500, which tracks the performance of the 500 largest companies in the US. The challenge of the competition is to rank the investments from best to worst at each given date. The scoring function for the competition is based on Spearman's rank correlation, which measures how well the predicted ranking of the investments matches up with the actual ranking.
DataCrunch uses the quantitative research of the CrunchDAO to manage its systematic market-neutral portfolio. DataCrunch built a dataset covering thousands of publicly traded U.S companies.
The long-term strategic goal of the fund is capital appreciation by capturing idiosyncratic return at low volatility.
In order to achieve this goal, DataCrunch needs the community to assess the relative performance of all assets in a subset of the Russell 3000 universe. In other words, DataCrunch is expecting your model to rank the constituent of its investment universe.
This rally is a new iteration on DataCrunch's dataset called master-v3.
The DataCrunch tournament is the legacy tournament from where the CrunchDAO adventure started. It will be deprecated to the benefit of the Datacrunch competition soon.
DataCrunch is a Systematic Long-Short Fund and a client of the CrunchDAO community.
As a participant, you will be asked to build models on DataCrunch's dataset.
Members are remunerated in $CRUNCH according to the correlation of their predictions with the stock market.
DataCrunch is paying the CrunchDAO with $CRUNCHs that are redistributed to the participant.
The data crunchers receive payments according to their position on the leaderboard.
Crunch Foundation represents the cutting edge of financial technology, merging blockchain and AI to democratize access to advanced analytical tools and drive global economic progress.
The machine learning competitions at Crunch Foundation are notable for their complexity in addressing real-world financial market problems. Additionally, they foster a strong sense of camaraderie among the participants.
I have been involved in Crunch since day 1. Over the last 4 years, I saw the Team building and adapting to an ever-changing market. I am proud to be part of this. In web3, most projects over promise and under deliver. And there is Crunch